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ABOUT THIS CHAPTER 


This chapter describes the Script Manager, a set of general text manipulation 
routines that let applications function correctly with non-Roman writing systems 
such as Japanese and Arabic, as well as Roman (or Latin-based) alphabets such as 
English, French, and German. The Script Manager works with one or more Script 
Interface Systems, each of which contains the rules for a specific method of 
writing. 


This chapter also documents version 2.0 of the Script Manager. It includes 
extended date and time utility routines, general-purpose number formatting 
routines, and additional text manipulation routines. 


Reader's guide: Most applications do not need to call the Script Manager 
routines directly, since they can handle text by means of 
TextEdit, which functions correctly with the Script Manager. 
Applications that need to call the new routines are those 
that directly manipulate text, such as word processors or 
programs that parse ordinary language. 


You should already be familiar with 


¢ QuickDraw's text manipulation functions 
¢ the International Utilities package 
¢ the Binary-Decimal Conversion package 


It may also be helpful to have a general understanding of how the Font Manager 
provides font support for QuickDraw and how TextEdit handles word selection and 
justification. 


The process of adapting an application to different languages, called 
localization, is made easier if certain principles are kept in mind when you 
create the application. For example, you should place quoted strings in 
resources separate from program code, and you should avoid implicit assumptions 
about the language that the application uses, such as the number of characters 
in its alphabet. General guidelines for writing applications that are easy to 
localize are presented in Human Interface Guidelines, available through APDA. 
They are summarized in the "Compatibility Guidelines" chapter. 
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ABOUT THE SCRIPT MANAGER 


The Script Manager is a set of extensions to the standard Macintosh Toolbox and 
operating system that does two things: 


¢ It provides standard, easy-to-use tools for the sophisticated 
manipulation of ordinary text. 
¢ It makes it easy to translate an application into another writing system. 


A script is a writing system. Roman scripts are writing systems whose alphabets 
have evolved from Latin. Non-Roman scripts, (such as Japanese, Chinese, and 
Arabic) have quite different characteristics. For example, Roman scripts 
generally have less than 256 characters, whereas the Japanese script contains 
more than 40,000. Characters of Roman scripts are relatively independent of 
each other, but Arabic characters change form depending on surrounding 
characters. 


For example, Figure 1 shows how Key Caps looks in Arabic script. 


Bes 1 ee 


Figure 1-Eey Caps in Arabic Script 
Figure 1—-Key Caps in Arabic Script 


The Script Manager is the low-level software that enables Macintosh applications 
to work with such different scripts. It includes utilities and initialization 
code to create an environment in which scripts of all kinds can be handled. In 
order for an application to use a particular script, a Script Interface System 
to support that script must also be present. ALl the currently available Script 
Interface Systems are written by Apple. Macintosh computers normally use the 
Roman script, so the Roman Interface System (RIS) is in the System file and 
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always present. On some models it may be in ROM. Other Script Interface Systems 
are the Kanji Interface System (KIS, also called KanjiTalk), which allows 
applications to write in Japanese; the Arabic Interface System (AIS); and the 
Hanze Interface System (HIS) for Chinese. 


A Script Interface System typically provides the following: 


¢ fonts for the target language 

« keyboard mapping tables 

* special routines to perform character input, conversion, 
sorting, and text manipulation 

¢ a desk accessory utility for system maintenance and control 


The Script Manager calls a Script Interface System to perform specific procedure 
calls for a given script. How a typical call (in this case, Pixel2Char) is 
passed from an application through the Script Manager to a Script Interface 
System and back is shown in Figure 2. 


Uses font seript to determine 
which script interface system 
to call 


Foman Arabic 
Interface Boystem Interface Boyshem Interface Baysten 


Figure 2-Example of a Procedure Call 


Figure 2—-Example of a Procedure Call 


In many cases the versatility provided by Script Interface Systems allows 
applications to be localized for non-Roman languages with no change to their 
program code (assuming they were written to permit localization to Roman script. 
Up to 64 different Script Interface Systems can be installed at one time on the 
Macintosh, allowing an application to switch back and forth between different 
scripts. When more than one Script Interface System is installed, an icon 
symbolizing the script in use appears at the right side of the menu bar. 


The Script Manager provides the functions needed to extend Macintosh's text 
manipulation capabilities beyond any implicit assumptions that would limit it to 
Roman scripts. The areas in which these limitations appear are: 


e Character set size. Large character sets, such as Japanese, require 
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two-byte codes for computer storage in place of the one-byte codes that 
are sufficient for Roman scripts. Script Manager routines permit 
applications to run without knowing whether one-byte or two-byte codes 
are being used. 

e Writing direction. The Script Manager provides the capability to write 
from right to left, as required by Arabic, Hebrew, and other languages, 
and to mix right-to-left and left-to-right directions within lines and 
blocks of text. 

* Context dependence. Context dependence means that characters may be 
modified by the values of preceding and following characters in the input 
stream. In Arabic, for example, many characters change form depending on 
other characters nearby. Context analysis is usually handled by the 
appropriate Script Interface System under the control of the Script 
Manager. 

¢ Word demarcation. Words in Roman scripts are generally delimited by 
Spaces and punctuation marks. In contrast, Japanese scripts may have no 
word delimiters, so the Script Manager provides a more sophisticated 
method of finding word boundaries. TextEdit calls may be intercepted by 
the Script Manager, which calls the appropriate Script Interface System 
routines to perform selection, highlighting, dragging, and word wrapping 
correctly for the current script. 

¢ Text justification. Justification (spreading text out to fill a given 
line width) is usually performed in Roman text by increasing the size of 
the interword spaces. Arabic, however, inserts extension bar characters 
between joined characters and widens blank characters to fill any 
remaining gap. The Script Manager provides routines that take these 
alternate justification methods into account when drawing, measuring, or 
selecting text. 


The Script Manager 2.0 release extends the tools and capabilities of developers 
on the Macintosh for three areas: text, dates and numbers. In addition, some 
minor bugs were fixed and performance enhancements incorporated. 


The new text routines include: lexically interpreting different scripts (e.g., 
in macro languages); allotting justification to different format runs within a 
line; ordering format runs properly with bidirectional text (Hebrew & Arabic); 
quickly separating Roman from non-Roman text, and determining word-wrap in text 
processing. The international utilities text comparison routines were 
Significantly improved in performance, in amounts ranging from 25% to 94%. 


The Macintosh date routines are extended to provide a larger range (roughly 35 
thousand years), and more information. This extension allows programs that need 
a larger range of dates to use system routines rather than produce their own, 
which may not be internationally compatible. The programmer can also access the 
stored location (latitude and longitude) and time zone of the Macintosh from 
parameter RAM. The Map cdev gives users the ability to change and reference 
these values. 


The new number routines supplement SANE, allowing applications to display 
formatted numbers in the manner of Microsoft Excel or Fourth Dimension, and to 
read both formatted and simple numbers. The formatting strings allow natural 
display and entry of numbers and editing of format strings even though the 
Original numbers and the format strings were entered in a language other than 
that of the final user. 


@ SpInside Macintosh * Version 1.0 * November 1989 * Apple Computer 
THE SCRIPT MANAGER ¢ 6 of 75 


Some of the following 2.0 routines have parameter blocks with reserved fields. 
These fields must be zeroed. 


In general, the additional routines are handled by the Script Manager rather 
than script interface systems. The three exceptions are FindScriptRun, 
PortionText, and VisibleLength which are handled by the individual script 
systems (such as Roman). The version of the Script Manager can be checked 
before using any of these routines, to make sure that it is Script Manager 2.0 
(version is $0200 or greater). For compatibility, all Script Systems test the 
version of the Script Manager and do not initialize if the major version number 
(first byte) is greater than they expect. 


For testing only, the version number in INIT 2 can be changed in ResEdit in the 
resource header to enable those systems to run; the header has the following 
format: 


60xx Branch 

XXXX Flags word 

4943 Resource type (INIT) 

4954 

0002 Resource number (2) 

02xx Script Manager version: change to O1FF for testing 


For an old script, the three routines FindScriptRun, PortionText, and 
VisibleLength will not work at all. In addition, the 'itl4' resource (see 
below) for the script will not be present, so the IntlTokenize and number 
formatting routines will not work properly for the particular script's features. 


The results returned from the new function calls are error and status codes 
which are found in the MPW 3.0 header and interface files. 


Note that in the following text, the term "Language" generally refers to a 
natural language rather than a programming language. 


The interface files for the Script Manager 2.0 routines are available in MPW 3.0 
and later releases. 
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TEXT MANIPULATION 


Applications that do extensive text handling and analysis, such as word 
processors, may need to use Script Manager routines directly and work in close 
interaction with Script Interface Systems. This section describes some 
potential problems with such applications and provides general guidelines for 
handling them. 


Determining the Script in Use 


The characteristics of different scripts require that text manipulation 
functions be handled according to the script in use. Every script has a unique 
identification number, as shown in the following list: 


Constant Value Script 
smRoman 0 Normal ASCII alphabet 
smKanji 1 Japanese 
smChinese 2 Chinese 
smKorean 3 Korean 
smArabic 4 Arabic 
smHebrew 5 Hebrew 
smGreek 6 Greek 
smRussian 7 Cyrillic 
smReserved1 8 Reserved 
smDevanagari 9 Devanagari 
smGurmukhi 10 Gurmukhi 
smGujarati 11 Gujarati 
smOriya 12 Oriya 
smBengali 13 Bengali 
smTamil 14 Tamil 
smTeLugu 15 Telugu 
smKannada 16 Kannada 
smMalayalam 17 Malayalam 
smSinhalese 18 Sinhalese 
smBurmese 19 Burmese 
smKhmer 20 Cambodian 
smThai 21 Thai 
smLaotian 22 Laotian 
smGeorgian 23 Georgian 
smArmenian 24 Armenian 
smMaldivian 25 Maldivian 
smTibetan 26 Tibetan 
smMongolian 27 Mongolian 
smAmharic 28 Ethiopian 
smS Lavic 29 Non-Cyrillic Slavic 
smVietnamese 30 Vietnamese 
smSindhi 31 Sindhi 
smUninterp 32 Uninterpreted symbols (such as MacPaint palette symbols) 


@ SpInside Macintosh * Version 1.0 * November 1989 * Apple Computer 


THE 


= 


ry 


SCRIPT MANAG 


ER ¢ 8 of 75 


The Script Manager looks for one of these values in the font field of the 
current grafPort (thePort) to determine which script the application is using. 
The script specified by the font of thePort is referred to as the font script. 
For example, if thePort's font is Geneva, the font script will be Roman. If 
thePort's font is Kyoto, the font script will be Japanese. If the mapping from 
font to script results in a request for a Script Interface System that is not 
available, the font script defaults to Roman. 


Note: Be sure to set the font in the current grafPort correctly so the Script 
Manager will know what script it is working with. Otherwise the results 
it returns will be meaningless (for example, if a block of Arabic text 
is treated as if it were kanji). 


The font script is not to be confused with the key script, which is maintained 
by the system. The key script value determines which keyboard layout and input 
method to use, but has no effect on characters drawn on the screen or on the 
operations performed by the Script Manager routines. The key and font scripts 
are not always the same. For example, while an international word processing 
application is using the Arabic Interface System for keyboard input, it may also 
be drawing kanji and Roman text on the screen. For further information about 
keyboard characters translation, see the System Resource File chapter. 


Drawing and Measuring 


The drawing and measuring of Roman and non-Roman text is handled correctly by 
standard Toolbox routines working in conjunction with the current Script 
Interface System and the Script Manager. For example, the QuickDraw routine 
TextWidth can always be used to find the width of a given line of text, since 
the Script Interface System that is currently in use modifies the routine if 
necessary to give proper results. 


For an application to be able to handle non-Roman as well as Roman scripts, 
however, it is important for text to be drawn and measured in blocks, rather 
than as individual characters. 


Warning: Since non-Roman scripts can have multibyte characters, breaking apart 
a string into individual bytes will have unpredictable results. 
This is not a good idea even on standard Roman systems: scaled or 
fractional-width characters cause incorrect results if measured 
and/or drawn one at a time. Also, it takes longer to measure the 
widths of several characters one at a time (using CharWidth) than it 
does to measure them together (using TextWidth or MeasureText). 


In addition to supporting the standard trap routines for drawing and measuring 
text, the Script Manager provides routines for handling text that is fully 
justified. These routines behave the same as the standard drawing and measuring 
routines, but they have the extra ability to spread the text out evenly on the 
line. 


Parsing 
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One problem in evaluating or searching non-Roman text is that the low byte of a 
double-byte character may be treated as though it were a valid character. For 
example, 93 (the ASCII code for a right bracket) is the value of the low byte 
for up to 60 double-byte kanji characters. If an application uses this 
character as a delimiter and searches through double-byte text, it can produce 
invalid results. To prevent invalid character evaluation results, applications 
should use the Script Manager routine CharByte to determine whether the 
character in question is one byte of a double-byte character. 


A related problem occurs when text is broken up into arbitrary chunks. This is 
a problem for scripts whose characters are more than one byte long, or that 
change their appearance based on surrounding context. The best solution is to 
avoid breaking text into physical chunks. If it is necessary to draw the text 
in sections, it should be done using the clipping facility of QuickDraw. 


For example, suppose a graphics program needs to draw a string that has been 
rotated to 45°, and it must use a temporary buffer to draw the original text 
before drawing the rotated text on the screen. The solution is to create a 
grafPort whose bit image is the buffer and set the clipping region or bitmap 
bounding rectangle to the dimensions of the buffer. The text can then be drawn 
into the grafPort, with the starting pen position set up so that the desired 
segment of the text appears in the buffer. The text can be drawn in the buffer 
as many times as is necessary, with a different starting pen position for each 
segment, until the entire text has been drawn on the screen. 


This method lets the Script Interface System correctly draw the characters each 
time, regardless of any double-byte character or context problems. It also 
ensures that fractional width characters will be drawn correctly. 


Character Codes 


An application may, for some reason, need to use a character code or range of 
codes to represent non-character data (such as field delimiters). Character 
codes below $20 are never affected by the Script Interface System, and therefore 
can be used safely for these special purposes. Note, however, that certain 
characters in this range are already assigned special meanings by parts of the 
Macintosh Toolbox (TextEdit) or certain languages (C). The following low-ASCII 
characters should be avoided: 


Character ASCII Code 
Nutt 0 

Enter 3 

Backspace 8 

Tab 9 

Line feed 10 

Carriage return 13 

System characters 17, 18, 19, 20 
Clear 27 

Cursor keys 28, 29, 30, 31 
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Key-Down Event Handling 


Double-byte characters are passed to an application by two key-down events. 
With double-byte scripts, the Script Interface System extends TextEdit as 
necessary to handle character buffering. 


Text-processing routines should check to see whether a key-down event is the 
first byte of a double-byte character by using CharByte. If so, they should 
buffer the first byte and wait for the second byte. When the second byte 
arrives, the character can be inserted in the text and drawn correctly. 


TextEdit performance can be improved significantly, even with Roman scripts, if 
the application program buffers characters. Each time through the event loop, 
if the current event is a keyDown or autoKey, place the byte in a buffer. 
Whenever the event is anything else (including the null event), insert the 
buffer (call TEDelete to remove the current selection range, call TEInsert to 
add the buffered characters, then clear the buffer). 


Writing Direction 


The standard writing direction at a given time is determined by the low-memory 
global teSysJust. Setting teSysJust is handled by the Script Interface System, 
which provides user control through a desk accessory. For Roman text teSysJust 
is set to 0; if it is —1, the user (or the Script Interface System) has 
specified right-to-left as the standard system direction. The value of this 
global has two results: 


e TextEdit, the Menu Manager, and the Control Manager's radio buttons and 
check boxes will all justify on the right instead of the left. For 
compatibility, the meaning of teJustLeft (0) changes. In that case, 0 
causes the text to be right-justified, so teJustLeft actually represents 
default justification. The parameter teForceLeft should be used if the 
application really needs to force the justification to be left. This is 
also the case for the TextEdit routine TextBox. 


¢ Bidirectional fonts, such as Arabic and Hebrew, will draw blocks from 
right to left. Within blocks of Arabic or Hebrew, QuickDraw is patched to 
order text from right to left. That is, text is drawn from the given 
penLoc towards the right as normal, but the order of the characters within 
that text may be reversed. 


When constructing dialog boxes, if the user sets teSysJust through the Script 
Interface System desk accessory, everything in dialog boxes will be lined up on 
the right edges of the individual item rectangles. If a column of buttons, for 
example, is supposed to line up in either writing direction, both the left and 
the right boundaries should be aligned. 


When a word processor displays different text fonts and styles within a Line, 
the pieces should be drawn (and measured) in different order, depending on the 
teSysJust value. 
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Partitioning Text 


You should be careful when text needs to be partitioned or analyzed. With the 
Script Manager, bytes may be mapped to different fonts in order to display non- 
Roman characters. This mapping is also not fixed, because it can depend on the 
context around the byte. Moreover, with Japanese and Chinese double-byte 
characters, a single byte may be only part of a character. Here is a list of 
Situations requiring extra care: 


e Applications should not assume that a given character code will always 
have the same width. With certain scripts, for example, using the new 
Font Manager cached width tables may give inaccurate results. The new 
QuickDraw routine MeasureText will return correct results with all 
current scripts. 


¢ Applications should not assume that a monospaced font always produces 
monospaced text. For example, the user might insert a wide Japanese 
character within a line of Monaco text. 


¢ Applications should be capable of processing zero-width characters. 
Zero-width characters should never be divided from the previous character 
in the text when partitioning text. When truncating a string to fit into 
a horizontal space, the correct algorithm is to truncate from the end of 
the string toward the beginning, one byte at a time, until the total width 
is small enough. This avoids cutting text before a zero-width character. 


¢ Script Manager utility routines should be used any time a line of text is 
to be partitioned, as in selection, searching, or word wrapping. If a 
line is to be truncated within a cell, for example, Pixel2Char should be 
used to find the point where the line should be broken. If a line of text 
is broken into pieces, as when a word processor displays different text 
fonts and styles within a line, Pixel2Char and Char2Pixel can be applied 
to each piece in succession to find the character offset or pixel width. 


e Applications should use the FindWord routine for word selection and word 
wrapping, since some Languages do not use spaces between words. TextEdit 
breaks words properly because it is extended by the Script Interface 
System to handle the current script. 


Numeric Strings 


The characters that can appear in a numeric string depend on the script in which 
the string is written. Applications that want to check ASCII strings to see if 
they are valid numeric fields, or convert ASCII strings into their equivalent 
numeric values, should use the SANE routines to do so. These routines will 
always return the correct result, regardess of the script in which the number is 
written. SANE routines are described in the Apple Numerics Manual. 


Note: As with the international sorting and date/time routines, the 
interpretation of numbers depends on the font for the current port. 
See "Script Information", later in this chapter. 
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USING THE SCRIPT MANAGER 


This section outlines the routines provided by the Script Manager and explains 
some of the basic concepts you need to use them. The actual routines are 
presented later in this chapter. 


Script Information 


FontScript tells your application to which script the font of the current 
grafPort belongs. IntlScript is similar to FontScript but is used by the 
International Utilities package to determine the number, date, time, currency, 
and sorting formats. 


Note: Application programs can examine the international parameter blocks 
that determine the number, date, time, currency, and sorting formats 
by calling the IUGetIntl routine in the International Utilities package. 
Applications should not try to access the international parameter blocks 
directly (via the Resource Manager routine GetResource). 


KeyScript is used to change the keyboard script, which determines the Layout of 
the keyboard. Word processors and other text-intensive programs should use this 
routine to change the keyboard script when the user changes the current font. 
For example, if the user selects Al Qahira (Cairo) as the current font or 
selects a run of text that uses the Al Qahira font, the application should set 
the keyboard script to Arabic. This can be done by using FontScript to find the 
script for the font, then using KeyScript to set the keyboard. 


Note: With many scripts, the user can also change the keyboard script by 
using the script desk accessory. Alternatively, your application can 
check the keyscript (using GetEnvirons) in its main event loop; if it 
has changed, the application can set the current font to the system font 
of the new keyscript (determined by a call to GetScript). This saves the 
user from having to do it manually. 


Character Information 

With scripts that use two-byte characters, such as kanji, it is necessary to be 
able to determine what part of a character a single byte represents. CharByte 
tells you whether a particular byte is the first or second byte of a two-byte 
character, or a single-byte character code. 


Here is an example of adding an extra step to a search procedure, similar to a 
check for whole words, to handle double-byte characters: 


{Search for text at keyPtr with size keySize} 


done := false; 
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newLocation := -1; 


repeat 
newLocation := Munger(mainHandle, newLocationt+l, 
keyPtr, keySize, nil, 0); {find the raw text} 
if newLocation < 0 then done := true 


{only use CharByte when ScriptManager is installed} 
else if (scriptsInstalled <= 1) 


(CharByte(mainHandle*,newLocation) <= 0) then done := true 
{note that CharByte doesn't touch the heap} 

until done; 

if newLocation >= @ then {we really got it, so do something} 
To make an extra test for whole words, the following code can be inserted 
instead of the done := true statement after CharByte: 

if not testWord then done := true {if no word testing} 

else begin {test whole word} 

HLock(mainHandle) ; {FindWord may touch heap} 


FindWord(mainHandle*, GetHandleSize(mainHandle) , 
newLocation, false, nil, myOffsets); 
if myOffsets[0] = newLocation then 
if myOffsets[1] = newLocation+keySize then done:= true; 
HUnlock(mainHandle) ; {restore} 
end; {whole word test} 


The CharType routine is similar to CharByte; it tells you what kind of character 
is indicated given a text buffer pointer and an offset. CharType returns 
additional information about the character, such as to which script it belongs 
and whether it's uppercase or lowercase. 


Text Editing 


Pixel2Char converts a screen position (given in pixels) to a character offset. 
This is useful for determining the character position of a mouse-down event. 


The Char2Pixel routine finds the screen position (in pixels) of insertion 
points, selections, and so on, given a text buffer pointer and a length. 


The FindWord routine can be used to find word boundaries within text. It takes 
an optional breakTable parameter which can be used to change its function for a 
particular script. For word wrapping or selection, application programs can 
call Pixel2Char to find a character offset and FindWord to find the boundaries 
of a word. 


The HiliteText routine is used to find the appropriate sections of text to be 
highlighted. It allows applications to be independent of the direction of text. 
The right-to-left languages are actually bidirectional, with mixed blocks of 
left-to-right and right-to-left text. Using this routine allows applications to 
highlight properly with left-to-right or with bidirectional scripts. 


The DrawJust and MeasureJust routines can be used to draw and measure text that 
is fully justified. These routines take a justification gap argument, which 
determines how much justification is to be done. The justification gap is the 
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difference between the normal width of the text, as measured by TextWidth, and 
the desired margins after justification has taken place. A justification gap of 
zero causes these routines to behave like the QuickDraw DrawText and MeasureText 
routines. 


Pixel2Char and Char2Pixel also take the justification gap argument, so they can 
be used on fully justified text. 


Advanced Routines 


The Transliterate routine converts text to the closest approximation in a 
different script or type of character. The primary use of this routine for 
developers is to convert uppercase text to lowercase and vice versa. 


The Font2Script routine can be used to map an arbitrary font number to the 
appropriate script. By using Font2Script and KeyScript, for example, your 
program can set the keyboard to correspond to the user's font selection. 


System Routines 


The GetEnvirons and SetEnvirons routines can be used to retrieve or to modify 
the global variables maintained for all scripts. Each script also has its own 
set of local variables and routine vectors. The GetScript and SetScript 
routines perform the same functions as GetEnvirons and SetEnvirons, but they 
work with the local area of the specified script. 


Warning: Changing the local variables of a script while it is running can be 
dangerous. Be sure you know what you are doing before attempting 
it, following the guidelines in the documentation for the particular 
Script Interface System. Save the original values of the variables 
you change, and restore them as soon as possible. 


The GetEnvirons and SetEnvirons routines either pass or return a long integer. 

The actual values that are loaded or stored can be long integers, integers, or 

signedBytes. If the value is not a long integer, then it is stored in the low- 

order word or byte of the long integer. The remaining bytes in the value should 
be zero with SetScript and SetEnvirons, and are set to zero with GetScript and 

GetEnvirons. 


The GetDefFontSize, GetSysFont, GetAppFont, GetMBarHeight, and GetSysJust 
functions return the current values of specific Script Manager variables. 
SetSysJust is a procedure that lets you adjust the system script justification. 
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'Itl4' RESOURCE 


There is a new international resource, 


'itl4', which contains information used 


by several of the 2.0 routines and must be localized for each script (including 


Roman) . 
In Pascal: 
Itl4Rec = 
Itl4Ptr = 
Itl4Handle = 
In C: 
struct Itl4Re 
short 
long 
short 
short 
long 
long 
short 
long 
long 
long 
long 
long 
long 
long 
long 

}; 

#ifndef cpl 


RECORD 
flags: 
resourcelype: 
resourceNum: 
version: 
resHeader1: 
resHeader2: 
numTables: 
mapOffset: 
strOffset: 
fetchOffset: 
unTokenOffset: 
defPartsOffset: 
resOffset6: 
resOffset7: 
resOffset8: 


integer; 
longIint; 
integer; 
integer; 
longIint; 
longIint; 
integer; 
longIint; 
longIint; 
longIint; 
longIint; 
longIint; 
longIint; 
longIint; 
longIint; 


{ one-based } 
{ offsets are from record start } 


{ the rest is data pointed to by offsets} 


END; 


“Ttl4Rec; 
“Ttl4Ptr; 


C4 
flags; 
resourcelype; 
resourceNum; 
version; 
resHeader1; 
resHeader2; 
numTables; 
mapOffset; 
strOffset; 
fetchOffset; 
unTokenOffset; 
defPartsOffset; 
resOffset6; 
resOffset7; 
resOffset8; 


usplus 


/*one-based*/ 
/*offsets are from record start*/ 
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typedef struct Itl4Rec Itl4Rec; 
#endif 


typedef Itl4Rec *Itl4Ptr, **Itl4Handle; 
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SCRIPT MANAGER ROUTINES 


The Script Manager provides routines that support text manipulation with scripts 
of all kinds. 


Assembly-language note: You can invoke each of the Script Manager routines 
with a macro of the same name preceded by an 
underscore. These macros, however, aren't trap macros 
themselves; instead they expand to invoke the trap 
macro ScriptUtil. The Script Manager then determines 
the routine to execute from the routine selector, a 
long integer that's pushed on the stack. The routine 
selectors are listed in the Script Manager equates 
included with the Macintosh Programmer's Workshop, 
Version 2.0 and higher. 


CharByte 
FUNCTION CharByte (textBuf: Ptr; textOffset: Integer) : Integer; 
CharByte is used to check the character type of the byte at the given offset 


(using an offset of zero for the first character in the buffer). It can return 
the following values: 


Value Meaning 

-1 First byte of a multibyte character 
0 Single-byte character 
1 Last byte of multibyte character 
2 Middle byte of multibyte character 


CharType 
FUNCTION CharType (textBuf: Ptr; textOffset: Integer) : Integer; 


CharType is an extension of CharByte which returns more information about the 
given byte. 


Note: If the byte indicated by the offset is not the last or the only byte 
of a character, the offset should be incremented until the CharType 
call is made for the lowest-order byte. 

The format of the return value is an integer with the following structure: 


Bits Contents 


0-3 Character type 
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4-7 Reserved 
8-11 Character class (subset of type) 


12 Reserved 
13 Direction 
14 Character case 
15 Character size 


Each Script Interface System defines constants for the different types of 
characters. The following predefined constants are available to help you access 
the CharType return value for the Roman script: 

CONST 


{ CharType character types } 


smCharPunct = Q; 
smCharAscii = 1: 
smCharEuro =f 


{ CharType character classes } 


smPunctNormal = $0000; 
smPunctNumber = $0100; 
smPunctSymbol = $0200; 
smPunctBlank = $0300; 


{ CharType directions } 


smCharLeft 
smCharRight 


$0000; 
$2000; 


{ CharType character case } 


$0000; 
$4000; 


smCharLower 
smCharUpper 


{ CharType character size (1 or 2 byte) } 


$0000; 
$8000; 


smCharlbyte 
smChar2byte 


For example, if the character indicated were an uppercase "A" (single-byte), 
then the value of the result would be smCharAscii + smCharUpper. Blank 
characters are indicated by a type smCharPunct and a class smCharBlank. 


Pixel2Char 


FUNCTION Pixel2Char (textBuf: Ptr; textLen, slop, pixelWidth: Integer; 
VAR leftSide: Boolean) : Integer; 


Pixel2Char should be used to find the nearest character offset within a text 
buffer corresponding to a given pixel width. It returns the offset of the 
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character that pixelWidth is closest to. It is the inverse of the Char2Pixel 
routine. 


The leftSide flag is set if the pixel width falls within the left side of a 
character. This flag can be used for word selection, and for positioning the 
cursor correctly at the end of lines. For example, during word selection if the 
character offset is at the end of a word and the leftSide flag is on, then the 
double click was actually on the following character, and the preceding word 
should not be selected. 


The slop argument is used for justified text. It specifies how many extra pixels 
must be added to the length of the string. If the text is not justified, pass a 
slop value of zero. 


Char2Pixel 


FUNCTION Char2Pixel (textBuf: Ptr; textLen, slop, offset: Integer; 
direction: SignedByte): Integer; 


Char2Pixel is the inverse of Pixel2Char ; it should be used to find the screen 
position of carets and selection points, given the text and length. For left- 
to-right scripts (including kanji), this routine works the same way as 
TextWidth. For other scripts, it works differently. The parameters are the 
same as in Pixel2Char, except for the direction. 


The direction argument indicates whether Char2Pixel is being called to determine 
where the caret should appear or to find the endpoints for highlighting. For 
unidirectional scripts such as Roman, it should have the value 1. The following 
predefined constants are available for specifying the direction: 


CONST 
smLeftCaret = 0; {place caret for left block} 
smRightCaret = -1; {place caret for right block} 
smHilite = 1} {direction is TESysJust} 


Like Pixel2Char, this routine can handle fully justified text. If the text is 
not justified, pass a slop value of zero. 


Although Char2Pixel uses TextWidth (with Roman script), the arguments passed are 
not the same. TextWidth, for ease of calling from Pascal, takes a byteCount 
argument which is redundant. The length and offset for Char2Pixel are not 
equivalent; the routine needs the context of the complete text in order to 
determine the correct value. For example, if myPtr is a pointer to the text 
‘abcdefghi', with the cursor between the ‘d' and the ‘e' (and no justification), 
the call would be 


pixelWidth := Char2Pixel(myPtr, 9, 0, 4, 1); 


When Char2Pixel is used to blink the insertion, the direction parameter to 
Char2Pixel should depend on the keyboard script. The call can look like this: 


keyDirection := GetScript(GetEnvirons(smKeyScript) ,smScriptRight) ; 
pixelWidth := Char2Pixel(myPtr, 9, 0, 4, keyDirection) ; 
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However, the keyboard script may change between drawing and erasing the 
insertion point. An application should remember the position where it drew the 
cursor, then erase (invert) at that position again. This can be done by 
remembering the keyDirection, the pixel width, or even the whole rectangle. For 
example, if the application remembers the keyDirection by declaring it as a 
global variable, code like this could be used: 


drawingInsertion := true; {when window is activated} 


{to blink the insertion point} 
IF drawingInsertion THEN 
BEGIN{drawing} 
keyDirection := GetScript(GetEnvirons(smKeyScript) ,smScriptRight) ; 
pixelWidth := Char2Pixel(myPtr, myLength, mySlop, keyDirection) ; 
{Get the vertical position for the insertion point, then invert } 
{ the appropriate rectangle} 
END 
ELSE 
BEGIN {erasing} 
pixelWidth := Char2Pixel(myPtr, myLength, mySlop, keyDirection) ; 
{Get the vertical position for the insertion point, then invert } 
{ the appropriate rectangle} 
END; {blinking} 
drawingInsertion := not drawingInsertion; 


FindWord 


PROCEDURE FindWord (textPtr: Ptr; textLength, offset: Integer leftSide: Boolean; 
breaks: BreakTable; var offsets: OffsetTable); 


FindWord takes a text string, passed in the textPtr and textLength parameters, 
and a position in the string, passed as an offset. The leftSide flag has the 
Same meaning here as in the Pixel2Char routine. FindWord returns two offsets in 
the offset table which specify the boundaries of the word selected by the offset 
and leftSide. For example, if the text "This is it" were passed with an 
offset and leftSide that selected the first word, the offset pair returned would 
be (0,4). 


FindWord uses a break table—a list of word-division templates—to determine the 
boundaries of a word. If the breaks parameter is NIL, the default word- 
selection break table for the current script is used. If it is POINTER(-1), 
then the default word-wrapping break table is used. If the breaks parameter has 
another value, it should point to a valid break table, which will be used in 
place of the default table. For information about constructing alternate break 
tables, contact Developer Technical Support. 


Word-selection break tables are used to find boundaries of words for word 
selection, dragging, spelling checking, and so on. Word-wrapping break tables 
are used to distinguish words for finding the widths of lines for wrapping. Word 
selection generally makes finer distinctions than word wrapping. For example, 
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the default word-selection break table for Roman script yields three words in 
the string (here): (, here, and ). For word wrapping, on the other hand, this 
string is considered to be one word. 


HiLiteText 


PROCEDURE HiliteText (textPtr: Ptr; 
textLength, firstOffset, secondOffset: Integer; 
VAR offsets: OffsetTable) ; 


HiliteText is used to find the characters between two offsets that should be 
highlighted. The offsets are passed in firstOffset and secondOffset, and 
returned in offsetTable. 


The offsetTable can be thought of as a set of three offset pairs. If the two 
offsets in any pair are equal, the pair is empty and can be skipped. Otherwise 
the pair identifies a run of characters. Char2Pixel can be used to convert the 
offsets into pixel widths, if necessary. 


The offsetTable requires three offset pairs because in bidirectional scripts a 
Single selection may comprise up to three physically discontinuous segments. In 
the Arabic script, for example, Arabic words are written right-to-left while 

English words in the same line are written left-to-right. Thus the selection of 
a section of Arabic containing an English word can appear as shown in Figure 3. 
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Figure 3-Example of Bidirectional Selection 


Figure 3—Example of Bidirectional Selection 
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HiLiteText returns the specific regions to be highlighted in this case as an 
offset table. 


DrawJust 
PROCEDURE DrawJust (textPtr: Ptr; textLength, slop: Integer); 


DrawJust is similar to the QuickDraw DrawText routine. It draws the given text 
at the current pen location in the current font, style, and size. The slop 
parameter indicates how many extra pixels are to be added to the width of the 
string when it is drawn. This is useful for justifying text. 


MeasureJust 
PROCEDURE MeasureJust (textPtr: Ptr; textLength, slop: Integer; charLocs: Ptr); 


MeasureJust is similar to the QuickDraw MeasureText routine. The charLocs 
parameter should point to an array of textLength+l integers; MeasureJust will 
fill it with the TextWidths of the first textLength characters of the text 
pointed to by textPtr. The first entry in the array will return the width of 
zero characters, the second the width of the first character, the third the 
width of the first and second characters, and so forth. 


Transliterate 


FUNCTION Transliterate (srcHandle, dstHandle: Handle; target: Integer; 
srcMask: Longint): Integer; 


Transliterate converts the given text to the closest possible approximation in a 
different script or type of character. It is the caller's responsibility to 
provide storage and dispose of it. The srcMask indicates which character types 
(scripts) in the source are to be converted. For example, Japanese text may 
contain Roman, hiragana, katakana, and kanji characters. The source mask could 
be used to limit transliteration to hiragana characters only. 


The target value specifies what the text is to be transliterated into. The low 
byte of the target is the format to convert to. A value of —1 means the system 
script. The high byte contains modifiers, which depend on the specific script 
number. The following predefined constants are available to help you specify 
target values: 


Constant Value Meaning 

smTransAscii 0 Target is Roman script 
smTransNative 1 Target is non-Roman script 
smTransCase 2 Switch case for any target 
smT ransLower 16384 Target becomes lowercase 
smTransUpper 32768 Target becomes uppercase 
smMaskAscii 1 Convert only Roman script 
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smMaskNative 2 
smMaskALl —1 


Convert only non-Roman script 
Convert all text 


The result is 0 for noErr or —1 for transliteration not available. 


Transliteration is performed on a "best effort" basis: typically it will be 
designed to give a unique transliteration into the non-Roman script. This may 
not be the most phonetic or natural transcription, since those transcriptions 
are usually ambiguous (for example, in certain transcriptions "th" may refer to 
the sound in the, the sound in thick, or the sounds in boathouse). 


On Roman systems, this routine is typically used to change case. For example, 
to convert all the characters in a block of text to single-byte Roman 
(uppercase), the value of srcMask would be smMaskALL, and target would be 
smTransUpper+smTransAscii. Each of the Script Interface Systems defines 
additional target constants to be used during transliteration. 


Here are some examples of the effects of transliteration: 


TO UPPERCASE 
to lowercase 


to uppercase 
TO LOWERCASE 


GetScript 
FUNCTION GetScript (script, verb: Integer) LongInt; 
eeeClick on the X-Ref button, and refer to Technical Note #243.¢ee¢ 


GetScript is used to retrieve the values of the local script variables and 
routine vectors. The following predefined constants are available for the verb 


parameter: 
Constant Value Meaning 
smScriptVersion 0 Software version 
smScriptMunged 2 Script entry changed count 
smScriptEnabled 4 Script enabled flag 
smScriptRight 6 Right-to-left flag 
smScriptJust 8 Justification flag 
smScriptRedraw 10 Word redraw flag 
smScriptSysFond 12 Preferred system font 
smScriptAppFond 14 Preferred application font 
smScriptNumber 16 Script 'itl0' ID, from dictionary 
smScriptDate 18 Script 'itl1' ID, from dictionary 
smScriptSort 20 Script 'itl2' ID, from dictionary 
smScriptFlags 22 Script Flags Word 
smScriptToken 24 '13tl4' ID number 
smScriptRsvd 26 Reserved 
smScriptLang 28 Script's language code 
smScriptNumDate 30 Number/Date Representation codes 
smScriptKeys 32 Script 'KCHR' ID, from dictionary 
smScripticon 34 Script 'SICN' ID, from dictionary 
smScriptPrint 36 Script printer action routine 
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smScriptTrap 38 Trap entry pointer 


smScriptCreator 40 Script file creator 
smScriptFile 42 Script file name 
smScriptName 44 Script name 


Verb values unique to a script are defined by the applicable Script Interface 
System. GetScript returns 0 if the verb value is not recognized or if the 
specified script is not installed. 


SetScript 


FUNCTION SetScript (script, verb: Integer; param: LongInt) : OSErr; 
SetScript is the opposite of GetScript. 
variables and routine vectors and uses the same verb values as GetScript. 
value smVerbNotFound is returned if the verb value is not recognized or the 
script specified is not installed. Otherwise, the function result will be 
noErr. It is a good idea to first retrieve the original value of the global 
variable that you want to change, using GetScript. The original value can then 
be restored with a second call to SetScript as soon as possible. 


It is used to change the local script 
The 


GetEnvirons 

FUNCTION GetEnvirons (verb: Integer) LongInt; 

eeeClick on the X-Ref button, and refer to Technical Note #243. ee 
GetEnvirons is used to retrieve the values of the global Script Manager 


variables and routine vectors. The following predefined constants are available 
for the verb argument: 


Constant Value Meaning 

smVersion 0 Environment version 

smMunged 2 Globals changed count 

smEnabled 4 Environment enabled flag 

smBiDirect 6 Set if scripts of different directions 
are installed together 

smFontForce 8 Force font flag 

smIntlForce 10 Force international utilities flag 

smForced 12 Current script forced to system script 

smDefault 14 Current script defaulted to Roman script 

smPrint 16 Printer action routine 

smSysScript 18 System script 

smLastScript 20 Last keyboard script 

smKeyScript 22 Keyboard script 

smSysRef 24 System folder reference number 

smKeyCache 26 Keyboard table cache pointer 

smKeySwap 28 Swapping table pointer 

smGenF lags 30 General Flags 

smOverride 32 Script Override flags 
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smCharPortion 34 Ch vs Sp Extra proportion, 4.12 fixed 


This routine returns 0 if the verb is not recognized. 


SetEnvirons 
FUNCTION SetEnvirons (verb: Integer; param: LongInt) : OSErr; 


SetEnvirons is the opposite of GetEnvirons. It is used to change the global 
Script Interface System variables and routine vectors; it uses the same verbs as 
GetEnvirons. The value smVerbNotFound is returned if the verb is not 
recognized. Otherwise, the function result will be noErr. 


It is a good idea to first retrieve the original value of the global variable 
that you want to change, using GetEnvirons. The original value can then be 
restored with a second call to SetEnvirons as soon as 

possible. 


FontScript 
FUNCTION FontScript: Integer; 


FontScript returns the script code for the font script. The font script is 
determined by the font of the current grafPort. 


IntlScript 
FUNCTION IntlScript: Integer; 


IntlScript returns the script code for the International Utilities script. Like 
the font script, the International Utilities script is determined by the font of 
the current grafPort. If the Script Manager global IntlForce is off, then 
IntlScript is the same as the font script; if IntlForce is on, IntlScript is the 
system script. For further information, see the International Utilities Package 
chapter in this volume. 


KeyScript 
PROCEDURE KeyScript(scriptCode: Integer) ; 


KeyScript is used to set the keyboard script. This routine also changes the 
keyboard layout to that of the new keyboard script and draws the script icon for 
the new keyboard script in the upper-right corner of the menu bar. 


Warning: Applications can also change the keyboard script without changing 
the keyboard layout or the script icon in the menu bar, by calling 
the SetEnvirons routine with the smKeyScript verb. However, this 
method should only be used to momentarily change the keyboard script 
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to perform a special operation. Changing the keyboard script without 
changing the keyboard layout violates the user interface paradigm and 
will cause problems for other Script Manager routines. 


Font2Script 
FUNCTION Font2Script(fontNumber: Integer): Integer; 
Font2Script translates a font identification number into a script code. This 


routine is useful for determining to which script a particular font belongs and 
which fonts are usable under a particular script. 


GetDefFontSize 
FUNCTION GetDefFontSize: Integer; 


GetDefFontSize fetches the size of the current default font. This routine is in 
the Pascal interface, not in ROM; it cannot be used with the 64K ROM. 


GetSysFont 
FUNCTION GetSysFont: Integer; 
GetSysFont fetches the identification number of the current system font. This 


routine is in the Pascal interface, not in ROM; it cannot be used with the 64K 
ROM, 


GetAppFont 
FUNCTION GetAppFont: Integer; 


GetAppFont fetches the identification number of the current application font. 
This routine is in the Pascal interface, not in ROM. 


GetMBarHeight 
FUNCTION GetMBarHeight: Integer; 
GetMBarHeight fetches the height of the menu bar as required to hold menu titles 


in its current font. This routine is in the Pascal interface, not in ROM; it 
cannot be used with the 64K ROM. 


GetSysJust 
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FUNCTION GetSysJust: Integer; 


GetSysJust returns the value of a global variable that represents the direction 
in which lines written in the system script are justified: 0 for left 
justification (the default case) or —1 for right justification. This routine is 
in the Pascal interface, not in ROM; it cannot be used with the 64K ROM. 


SetSysJust 
PROCEDURE SetSysJust (newJust: Integer); 


GetSysJust sets a global variable that represents the direction in which lines 
written in the system script are justified: 0 for left justification (the 
default case) or —1 for right justification. This routine is in the Pascal 
interface, not in ROM; it cannot be used with the 64K ROM. 


SCRIPT MANAGER 2.0 ROUTINES 


The new text routines include: lexically interpreting different scripts (e.g., 
in macro languages); allotting justification to different format runs within a 
line; ordering format runs properly with bidirectional text (Hebrew & Arabic); 
quickly separating Roman from non-Roman text, and determining word-wrap in text 
processing. The international utilities text comparison routines were 
Significantly improved in performance, in amounts ranging from 25% to 94%. 


ParseTable 
In Pascal: 


Type 
CharByteTable = Packed Array [0..255] of SignedByte; 


Function ParseTable(table: CharByteTable): Boolean; 

typedef char CharByteTable[256]; 
In C: 

pascal Boolean ParseTable(CharByteTable table); 

Double-byte characters have distinctive high (first) bytes, which allows them to 
be distinguished from single-byte characters. The ParseTable routine can be 
used to traverse double-byte text quickly. It does this by filling a table of 
bytes with values which indicate the extra number of bytes taken by a given 


character. This array can then be used instead of making function calls on each 
byte. As with the other script-specific routine calls, the values in the table 
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will vary with the script of the current font in thePort, so you must make sure 
to set the font correctly. 


An entry in the table is set to 0 for a single-byte character and 1 for the 
first byte of a double-byte character. (With a single-byte script, the entries 
are all zero.) The return value from the routine will always be true. This 
routine has always been present in the Script Manager, but was not documented 
until now. Also note that script systems will never require more than two bytes 
per character, so you can safely assume that there are only single-byte and 
double-byte characters. 


For example, in the following code the reference to tablePtr[myChar] is 
functionally equivalent to a use of CharByte, but does not involve a trap call. 


In Pascal: 
Var 
myChar: Integer; 
i, max: Integer; 
tablePtr: CharByteTable; 


S: String [255]; 
ParseResult: Boolean; 


Begin 
ParseResult := ParseTable(tablePtr) ; 
is:=1; 


max := length (Ss); 
While i <= max do Begin 


myChar := ord(s[i]); {get byte} 
i:=i+t; {skip to start of next} 
if (tablePtr[myChar] <> 0) then Begin {if double-byte} 
myChar := myChar * $100 + ord(s[i]); {include next byte} 
i:=i+t; {skip to start of next} 
End; 
{do something with myChar} 
End; 
End; 
In C: 
short mychar; 
CharByteTable table; 
char *s = "Test String"; 
Boolean ParseResult; 


ParseResuLlt = ParseTable(table); 


while ( *s ) { 
mychar = *s++; 


if ( table[mychar] <> 0 ) 
mychar = mychar << 8) + *s++; 


/* do something with mychar */ 
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; 
} 


Remember that the CharByteTable is specific to the script. There could be two 
or three scripts installed that are double-byte and have different CharByteTable 
arrays. 


IntlTokenize 
In Pascal: 

Function IntlTokenize ( tokenParam : TokenBlockPtr ): TokenResults; 
In C: 

pascal TokenResults IntlTokenize(TokenBlockPtr tokenParam) ; 
The IntlTokenize routine is intended for use in macro expressions and similar 
programming constructs intended for general users. It allows the program to 
recognize variables, symbols and quoted literals without depending on the 
particular natural language (e.g., English vs. Japanese). 
The routine is a mildly programmable regular expression recognizer for parsing 
text into tokens. The single parameter is a parameter block describing the text 
to be tokenized, the destination of the token stream, the 'itl4' resource 
handle, and the various programmable options. IntlTokenize will return a list 
of tokens found in the text. 
In Pascal: 


TokenBlock = RECORD 


source: Ptr; {pointer to stream of characters} 
sourceLength: LongInt; {length of source stream} 
tokenList: Ptr; {pointer to array of tokens} 
tokenLength: LongInt; {maximum length of TokenList} 
tokenCount: LongInt; {number of tokens generated by } 

{ tokenizer} 
stringList: Ptr; {pointer to stream of identifiers} 
stringLength: LongInt; {length of string list} 
stringCount: LongInt; {number of bytes currently used} 
doString: Boolean; {make strings & put into } 

{ StringLIst} 
doAppend: Boolean; {append to TokenList rather } 

{ than replace} 
doAlphanumeric: Boolean; {identifiers may include numeric} 
doNest: Boolean; {do comments nest?} 


leftDelims, rightDelims: ARRAY[0..1] OF TokenType; 
leftComment, rightComment: ARRAY[0..3] OF TokenType; 


escapeCode: TokenType; {escape symbol code} 
decimalCode: TokenType; {decimal symbol code} 
itlResource: Handle; {itl4 resource handle of } 

{ current script} 
reserved: array [0..7] of Longint; { must be zeroed! } 
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source 


END; 


In C: 


TokenType = Integer; {see list of TokenType values at end of document} 
TokenRec = RECORD 
theToken: TokenType; 
Position: Ptr; {ptr into original source} 
length: LongInt; {length of text in original source} 
stringPosition: StringPtr; {Pascal/C string copy of identifier} 
END; 
struct TokenBlock { 
Ptr source; /*pointer to stream of characters*/ 
long sourceLength; /*length of source stream*/ 
Ptr tokenList; /*pointer to array of tokens*/ 
long tokenLength; /*maximum length of TokenList*/ 
long tokenCount; /*number tokens generated by tokenizer*/ 
Ptr stringList; /*pointer to stream of identifiers*/ 
long stringLength; /*length of string list*/ 
long stringCount; /*number of bytes currently used*/ 
Boolean doString; /*make strings & put into StringLIst*/ 
Boolean doAppend; /*append to TokenList rather than replace*/ 
Boolean doAlphanumeric; /*identifiers may include numeric*/ 
Boolean doNest; /*do comments nest?*/ 
TokenType leftDelims[2]; 
TokenType rightDelims[2]; 
TokenType leftComment[4]; 
TokenType rightComment[4]; 
TokenType escapeCode; /*escape symbol code*/ 
TokenType decimalCode; 
Handle itlResource; /*ptr to itl4 resource of current script*/ 
long reserved[8]; /*must be zero! */ 


bar 


#ifndef cplusplus 
typedef struct TokenBlock TokenBlock; 


#endif 


typedef TokenBlock *TokenBlockPtr; 


typedef short TokenType; 


struct TokenRec { 


TokenType theToken; 


Ptr 
long 


Position; 
length; 


StringPtr stringPosition; 


. 
| 


For the TokenBlock record: 


/*pointer into original Source*/ 
/*length of text in original source*/ 
/*Pascal/C string copy of identifier*/ 


is a pointer to the beginning of a stream of characters 


(not a Pascal string). 
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sourceLength 


tokenList 


tokenLength 


tokenCount 


stringList 


stringLength 


stringCount 


doString 


doAppend 


doAlphanumeric 


doNest 


leftDelims 


rightDelims 


LeftComment 


is the number of characters in the source stream. 


is a pointer to memory allocated by the application for the 
token stream. The tokenizer places the tokens it generates 
at and after the address in tokenList. 


is the number of tokens that will fit in the memory pointed 
to by tokenList (not the number of bytes). 


is the number of tokens that are currently occupying the space 
pointed to by tokenList. If the doAppend flag is true, then 
tokenCount must be a correct number before calling the 
tokenizer. The tokenizer modifies this value to show how many 
tokens are in the token stream after tokenizing. 


is a pointer to memory allocated by the application for strings 
that the tokenizer generates if the doString flag is true. If 
the flag is false, then stringList is ignored. 


is the number of bytes of memory allocated for stringList. 


is the number of bytes that are currently occupying the space 
pointed to by stringList. If the doAppend flag is true, then 
stringCount must be a correct number before calling the 
tokenizer. The tokenizer modifies this value to show how many 
bytes are in the string stream after tokenizing. 


is a boolean flag that instructs the tokenizer to create a 
sequence of even-boundaried, null-terminated Pascal strings. 
Each token generated by the tokenizer will have a string 
created to represent it if the flag is true. Each token 
record contains the address of the string that represents it. 


is a boolean flag that instructs the tokenizer to append tokens 
to the space pointed to by tokenList rather than replace 
whatever is there. tokenCount must correctly reflect the 
number of tokens in the space pointed to by tokenList. 


is a boolean flag that, when true, states that numerics may 
be mixed with alphabetics to create alphabetic tokens. 


is a boolean flag that instructs the tokenizer to allow nested 
comments of any depth. 


is an array of two integers, each of which corresponds to the 
class of the symbol that may be used as a left delimiter for a 
quoted literal. Double quotes, for instance, is class 
token2Quote. If only one left delimiter is needed, the other 
must be specified to be delimPad. 


is an array of two integers, each of which corresponds to the 
class of the symbol that may be used as the matching right 
delimiter for the corresponding left delimiter in leftDelims. 


is an array of four integers. Each successive pair of two 
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describes a pair of tokens that may be used as left delimiters 
for comments. These tokens are stored in reverse order. The 
tokens numbered zero and two are the second tokens of the two- 
token sequences; the tokens numbered one and three are the 
first tokens of the two-token sequences. 


If only one token is needed for a delimiter, the second token 
must be specified to be delimPad. If only one delimiter is 
needed, then both of the tokens allocated for the other symbol 
must be delimPad. The first token of a two-token sequence is 
the higher position in the array. For example, the two left 
delimiters (* and { would be specified as 


LleftComment[0Q]:= tokenAsterisk; (*asterisk*) 
LleftComment[1]:= tokenLeftParen; (*left parenthesis*) 
LeftComment[2]:= delimPad ; (*nothing*) 
LeftComment[3]:= tokenLeftCurly; (*curly brace*) 
rightComment is an array of four integers with similar characteristics as 


leftComment. The positions in the array of the right 
delimiters must be the same as their matching left delimiters. 


escapeChar is a single integer that is the class of the symbol that may be 
used for an escape character. The tokenizer considers the 
escape character to be an escape character (as opposed to being 
itself) only within quoted literals. 


If backslash (\) is given as the escapeChar, then the tokenizer 
would consider it an escape character in the following string: 


"This is an escape\n" 


It would not be considered an escape character in a non-quoted 
string like the following: 


This isn't an escape\n 


decimalCode is a single integer that is the tokenType that may be used for 
a decimal point. The tokenizer considers the decimal character 
to be a decimal character (as opposed to being itself) only 
when flanked by numeric or alternate numeric characters, or 
when following them. When the strings option is selected, the 
decimal character will always be transliterated to an ASCII 
period (and alternate numbers will be transliterated to ASCII 
digits). 


itlResource is a handle to the 'itl4' resource of the script in current 
use. The application must load the 'itl4' resource and place 
its handle here before calling the tokenizer. Every time the 
script of the text to be tokenized changes, the pointer to the 
respective 'itl4' resource must be placed here. 


reserved locations must all be zeroed. 


For the token record: 
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theToken is the ordinal value of the token represented by the token 
record. 


position points to the first character in the original text that 
caused this particular token to be generated. 


length is the length in bytes of the original text corresponding 
to this token. 


stringPosition points to a null-terminated, even-boundariedPascal string 
that is the result of using the doString option. If doString 
is false then stringPosition is always set to NIL. 


The available token types are: whitespace, newline, alphabetic, numeric, 
decimal, endOfStream, unknown, alternate numeric, alternate decimal, and a host 
of fixed token symbols, such as (#@: :=. 


The tokenizer does not attempt to provide complete lexical analysis, but rather 
offers a programmable "pre-lex" function whose output should then be processed 
by the application at a lexical or syntactic level. 


The programmable options include: whether to generate strings which correspond 
to the text of each token; whether the current tokenize call is to append to, 
rather than replace, the current token list; whether alphabetic tokens may have 
numerics within them; whether comments may be nested; what the left and right 
delimiters for comments are (up to two sets may be specified); what the left and 
right delimiters for quoted literals are (up to two sets may be specified); what 
the escape character is; and what the decimal point symbol is. 


Some users may use two or more different scripts within a program. However, 
each script's character stream must be passed separately to the tokenizer 
because different resources must be passed to the tokenizer depending on the 
script of the text stream. Appending tokens to the token stream lets the 
application see the tokens generated by the different scripts' characters as a 
single token stream. Restriction: users may not change scripts within a comment 
or quoted literal because these syntactic units must be complete within a single 
call to the tokenizer in order to avoid tokenizer syntax errors. 


The application may specify up to two pairs of delimiters each for both quoted 
literals and comments. Quoted literal delimiters consist of a single symbol, 
and comment delimiters may be either one or two symbols (including newline for 
notations whose comments automatically terminate at the end of lines). The 
characters that compose literals within quoted literals and comments are 
normally defined to have no syntactic significance; however, the escape 
character within a quoted literal does signal that the following character 
should not be treated as the right delimiter. Each delimiter is represented by 
a token, as is the literal between left and right delimiters. 


If two different comment delimiters are specified by the application, then the 
doNest flag always applies to both. Comments may be nested if so specified by 
the doNest flag with one restriction that must be strictly observed in order to 
prevent the tokenizer from malfunctioning: nesting is legal only if both the 
left and right delimiters for the comment token are composed of two symbols 
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each. In this version, there is limited support for nested comments. When 
using this feature, test to insure that it meets your requirements. 


An escape character between left and right delimiters of a quoted literal 
Signals that the following character is not the right delimiter. An escape 
character is not specially recognized and has no significance outside of quoted 
literals. When an escape character is encountered, the portion of the literal 
before the escape is placed into a single token, the escape character itself 
becomes a token, the character following the escape becomes a token, and the 
portion of the literal following the escape sequence becomes a token. 


A sequence of whitespace characters becomes a single token. 
Newline, or carriage return, becomes a single token. 


A sequence of alphabetic characters becomes an alphabetic token. If the 
doAlphanumeric flag is set, then alphabetic characters include digits, but the 
first character must be alphabetic. 


A sequence of numeric characters becomes a numeric token. 


A sequence of numeric characters followed by a decimal mark, and optionally 
followed by more numeric characters, becomes a realNumber token. 


Some scripts have not only "English" digits, but also their own numeral codes, 
which of course will be unrecognizable to the typical application. A sequence 
of alternate digits becomes an alternate numeric token. If the strings option 
is selected then the digits will be transliterated to "English" digits. This 

includes the realNumber tokens, whose results become alternate real tokens. 


The end of the character stream becomes a token. 


A token record consists of a token code, a pointer into the source stream 
(signifying the first character of the sequence that generated the token), the 
byte length of the sequence of characters that generated the token, and space 
for a pointer to a Pascal string, explained next. 


The application may instruct the tokenizer to generate null-terminated, even- 
boundaried Pascal strings corresponding to each token. In this case, if the 
token is anything but alphabetic or numeric then the text of the source stream 
is copied verbatim into the Pascal string. Otherwise, if the text in the source 
stream is Roman letters or numbers then those characters are transliterated into 
Macintosh eight-bit ASCII and a string is created from 

the result, allowing users of other languages to transparently use their own 
script's numerals or Roman characters for numbers or keywords. Non-Roman 
alphabetics are copied verbatim. 


Semantic attributes of byte codes vary from natural language to natural 
language. As an example, in the Macintosh character set code $81 is an A, but 
in Kanji this code is the first byte of many double-byte characters, some of 
which are alphabetic, some numeric, and some symbols. This information is 
retrieved from the 'itl4' resource, which also contains a canonical string 
format for the fixed tokens, so that the internal format of formule can be 
redisplayed in the original language. 
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'1tl4' also holds a string copy routine which converts the native text to the 
corresponding English (except for alphanumerics). As with the other 


international resources, the choice of 'itl4' depends on the script interface 
system in use. 


Macro Text 


sitet 


itl4 Resource 


TEE 


am] een] a Te] [a a a 


Figure 4-IntlTokenize 
Figure 4—IntlTokenize 
The untokenTable in the 'itl4' resource contains standard representations for 


the fixed tokens, and can be used to display the internal format. An example of 
how a user might access this table and use the token information follows: 


In Pascal: 
Type 
UntokenTable = Record 
len: Integer; 
lastToken: Integer; 
index: array [0..255] of Integer; 
{index table; last = lastToken} 
{list of Pascal strings here. index pointers } 
{ are from front of table} 
End; 
UntokenTablePtr = “UntokenTable; 


UntokenTableHandle = “UntokenTablePtr; 
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Function GetUntokenTable( Var x: UntokenTableHandle ): Boolean; 
Var 
itl4: itl4Handle; 


P: UntokenTablePtr; 
Begin 
GetUntokenTable := false; {assume error} 
itl4 := itl4Handle(IUGetIntl(4)); {get itl4 record} 
if itl4 <> nil then begin {if ok} 
HLock(Handle(itl4) ); {lock for safety} 


P := UntokenTablePtr(ord(itl4%)+itl4**. untokenOffset) ; 
f{untokenize parts subtable} 
With P* Do Begin {using resource table} 
x := UntokenTableHandle(NewHandle(len) ); 
{make handle of proper size} 
BlockMove(Ptr(p),Ptr(x*),len); {copy contents} 


End; 
HUnlock(Handle(itl4)); {free back up} 
GetUntokenTable := true; {no error} 
End; 
End; 


If (GetUntokenTable(myUntokenTable)) then 
With curToken* Do Case theToken OF 
face af 
tokenAlLpha: 
AppendString( myVariable[i] ); 
Otherwise With myUntokenTable**, curToken* Do Begin 
If theToken > lastToken Then Begin 
AppendString( '?' ); 
End Else Begin 
sPtr := Pointer(ord(@len) + index[theToken] ); 


AppendString(sPtr~%) ; 
End; {if} 
End; {item} 
End; {case} 
In C: 

struct UntokenTable { 

short len; 

short lastToken; 

short index[256]; /*index table; last = lLastToken*/ 


}; 

#ifndef cplusplus 

typedef struct UntokenTable UntokenTable; 

#endif 

typedef UntokenTable *UntokenTablePtr, **UntokenTableHandle; 
GetUntokenTable(UntokenTableHandle *x) 


Itl4Handle itl4; 
UntokenTablePtr p; 
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itl4 = (Itl4Handle) IUGetIntl (4) ; 


if (itl4) { 
HLock((Handle)itl4); 


P = (UntokenTablePtr)( (char *)(*itl4) + ( (*itl4)->unTokenOffset ) ); 
*x = (UntokenTableHandle)NewHandle(p->Llen) ; 


if (x) 
BlockMove( (Ptr)p, (Ptr) **x,p->Llen); 


HUnlock( (Handle)itl4) ; 


return( (short) *x); 
} 
else 
return(0); 
} 


if ( GetUntokenTable(myUntokenTable) ) 
switch curtoken->theToken { 
, aera p 
case tokenAlLpha: 
AppendString(myvariable[i]); 
break; 
default: 
if (curtoken->theToken > LastToken) 
AppendString("?"); 
else { 
Hlock( (Handle)myUntokenTable) ; 
sptr = (char *)(*myUntokenTable) + 
(*myUntokenTable) ->index[curtoken->theToken] ; 
AppendString(sptr) ; 
HUnlock( (Handle)myUntokenTable) ; 


break; 


PortionText 
In Pascal: 

Function PortionText (textPtr :Ptr; textLen : Longint): Fixed; {proportion} 
In C: 

pascal Fixed PortionText(Ptr textPtr,long textLen) ; 
This routine returns a result which indicates the proportion of justification 
that should be allocated to this text when compared to other text. It is used 


when justifying a sequence of format runs, so that the appropriate amount of 
extra width is apportioned properly among them. For example, suppose that there 
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are three format runs on a line: A, B, and C. The line needs to be widened by 
11 pixels for justification. Calling PortionText on these format runs yields 
the first row in the following table: 


A B C Total 
PortionText: 5.4 7.3 8.2 20.9 
Normalized: 258 349 remainder 1.00 
Pixels (p): 2.84 3.84 remainder 11.0 
Rounded (r): 3 4 remainder 11 


The proportion of the justification to be allotted to A is 25.8%, so it receives 
3 pixels out of 11. In general, to prevent rounding errors, 

rn = round(y¥1..nP) — ¥1..n—1 r (which can be computed iteratively) ; 

e.g., rB is round(3.84+2.84) — 3, and rC is round(11.0) — 7. 


For normal Roman text, the result is currently a function of the number of 
Spaces in the text, the number of other characters in the text, and the font 


size (the raw size, not ascent + descent + leading). This may change in the 
future, so values should be compared at the time of execution. 


Justifying Format Runs 


® ® © 


| 11 pt Gap 


a ae 
a4 73 O28 
im 
20.0% 34.9% 39.28 
- 
2.84 pt 3.84 pt 4.32 pt 


Figure 5—PortionTert 
Figure 5—PortionText 
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FormatOrder 


In Pascal: 
FormatOrder = array [0..0] of Integer; 
FormatOrderPtr = “FormatOrder; 
Procedure GetFormatOrder ( ordering: FormatOrderPtr; 
firstFormat: Integer; 
LlastFormat: Integer; 
LineRight: Boolean; 
RLDirProc: Ptr; 
dirParam: Ptr); 
In C: 
typedef short FormatOrder[1]; 


typedef FormatOrder *FormatOrderPtr; 


pascal void GetFormatOrder(FormatOrderPtr ordering,short firstFormat, 
short lastFormat, Boolean lineRight, 
Ptr rlDirProc, Ptr dirParam) ; 


This routine orders the text properly for display of bidirectional format runs. 
Word processing programs that use this procedure for multi-font text can be 
independent of script text-ordering in a line (e.g., Hebrew or Arabic right-left 
text). The ordering points to an array of integers, with (lastFormat — 
firstFormat + 1) entries. The GetFormatOrder routine retrieves the direction of 
each format by calling the direction procedure, RLDirProc, which has the 
following format: 


In Pascal: 

Function MyRLDirProc ( theFormat : Integer; dirParam : Ptr) :Boolean; 
In C: 

pascal Boolean MyRLDirProc(short theFormat, Ptr dirParam) ; 


The RLDirProc is called with the values from firstFormat to lastFormat to 
determine the directions of each of the format runs. It returns true for 
right-left text direction, otherwise false. The parameter dirParam is available 
to provide other necessary information for the direction procedure 

(i.e., style number, pointer to style array, etc). 


GetFormatOrder returns a permuted list of the numbers from firstFormat to 
lastFormat. This permuted list can be used to draw or measure the text. 
(For more detail, see the Script Manager developers' packet). The lineRight 
parameter is true if the text is right-left orientation, otherwise false. 


The array Ordering is created and filled by your application. The first element 
in the array should correspond to the parameter firstFormat, and the last 
element should correspond to lastFormat. GetFormatOrder loops through this 
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array and passes each element in the array back to the RLDirProc function. 
Since you fill the ordering array and you write the RLDirProc, you should 

obviously store format runs in a way that makes the GetFormatOrder routine 
useable. 


One obvious way to do this would be to declare a record type for format runs 
that allowed you to save things like font style, font ID, script number, and so 
on. You then could store these records in an array. When the time came to call 
GetFormatOrder, you would simply fill the Ordering array with the indexes that 
you used to access your array of format run records. GetFormatOrder would 
return an array which described the correct drawing order for your format runs. 


Consider this example. Let uppercase letters stand for format runs that are 
left to right, and lowercase letters stand for right-left format runs. For 
example, there are two format runs in the following Line. 


1 2 
ABCfed 


With left-right line direction, the text should appear on the screen as: 


1 2 
ABCdef 


With right-left line direction, the text should appear on the screen as: 


2 1 
fedABC 


GetFormatOrder is used to tell you what order the format runs should be drawn in 
based on line direction for a particular line of text. 


myporderingy — po 
EIGN ae Cee — a na el 
FirstFormat = 3 myRLDirFroc(3] = T 
lastFormat = 9 myRLDirProc(4] = T 
lineRight = GetBys Just myRLDirProc(?] = T 
DyELDrProc[#] = T 
otherwise 


myRLDirPrac = F 


Figure 6-—GetFormatOrer 
Figure 6—GetFormatOrder 
For example, in Pascal: 
GetFormatOrder(myOrdering, firstFormat, LastFormat, 


GetSysJust = 0,MyRLDirProc,nil); 
for i := 0 to lastFormat-firstFormat do 
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with MyFormat [myOrdering [i]], MyStyle [formatStyle] do begin 
TextFont(styleFont) ; 


{s 


et up other text style features...} 


case what of 


en 
end; { 
end; {for} 


In C: 


drawing: DrawText(textStartPtr, formatStart, formatLength) ; 
measuring: TextWidth(textStartPtr, formatStart, formatLength) ; 
{and so on} 
d; {case} 
with} 


GetFormatOrder(myOrdering, firstFormat, LastFormat, (Boolean)GetSysJust(), 


(Ptr)MyRLDirProc,nil); 


for ( i = 0, i <= (lastFormat-firstFormat), i++) 
/* set up style stuff */ 
switch what { 
case drawing: 
DrawText(textStartPtr, formatStart, formatLength) ; 
break; 
case measuring: 
TextWidth(textStartPtr, formatStart, formatLength) ; 
break; 
default: 
break; 
} 
FindScriptRun 
In Pascal: 
Function FindScriptRun (textPtr: Ptr; textLen: Longint; 
VAR lenUsed: Longint): ScriptRunStatus; 
ScriptRunStatus = RECORD 
script: SignedByte; 
variant: SignedByte; 
END; 
In C: 
pascal struct ScriptRunStatus FindScriptRun(Ptr textPtr,long textLen, 
long *lLenUsed) ; 
struct ScriptRunStatus { 
short script; 
short variant; 
}; 
char *mychararray = ‘'abcDEFghi'; 
char *textptr; 
long textlength; 
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ScriptRunStatus§ srs; 
long lenused; 


srs = FindScriptRun(mychararray, (long)strlen(mychararray) ,&lenUsed) ; 

/* lenUsed would now = 3, blocktype would equal 0 */ 

/* we can point at the remainder of the text with the following code */ 
textptr = mychararray + lenUsed; 

textlen = strlen(mychararray) - lenUsed; 


For compatibility, each script allows Roman text to be mixed in. This routine 
is used to break up mixed text (Roman & Native) into blocks. The lenUsed is set 
to reflect the length of the remaining text. The return value reflects the type 
of block: the upper byte is the script (0 being Roman text) and the lower byte 
being script-specific (script systems can return types of native sub-scripts, 
such as Kanji, Katakana and Hiragana for Japanese). For example, given that the 
capital letters represent Hebrew text: 


In Pascal: 


myCharArray = '‘abcDEFghi'; 

myCharPtr := @myCharArray; 

blockType := FindScriptRun (myCharPtr, 9, lLenUsed); 
{lenUsed = 3, blockType = 0: get remainder of text with: } 
textPtr := Ptr(ord(textPtr)+lenUsed) ; 


textLen textLen-lenUsed; 
StyledLineBreak 
In Pascal: 


Function StyledLineBreak(textPtr: Ptr; 
textLen: Longint; 
textStart:Longint; 
textEnd: Longint; 


flags: Longint; 

Var textWidth:Fixed; {on exit, set if too long} 
Var textOffset: Longint) 

:StyledLineBreakCode; 


StyledLineBreakCode = (smBreakWord,smBreakChar, smBreakOverf low) ; 
In C: 


pascal StyledLineBreakCode StyledLineBreak(Ptr textPtr,long textLen, 
long textStart, long textEnd, 
long flags,Fixed *textWidth, 
long *textOffset) ; 


enum {smBreakWord, smBreakChar,smBreakOverflow}; 
typedef unsigned char StyledLineBreakCode; 


This routine breaks a line on a word boundary. The user will loop through a 
sequence of format runs, resetting the textPtr and textLen each time the script 
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changes; and resetting the textStart and textEnd for each format run. The 
textWidth will automatically be decremented by StyledLineBreak. 


TextPtr points to the start of the text, textLen indicates the maximum length of 
the text, and the textWidth parameter indicates the maximum pixel width of the 


rectangle used to display the text starting at the textStart and ending at the 
textEnd. The flags parameter is reserved for future expansion and must be zero. 


Aah 


Times Chicago Venice Courier Kote 


textP tre 


Figure 7—StyledLineBreak 
Figure 7—-StyledLineBreak 


On input, a non-zero textOffset indicates whether this is the first format run 
(possibly forcing a character break rather than a word break: if textOffset is 
non-zero, at least one character will be returned if the line is not empty). On 
output it is the number of bytes from textPtr up to the point where the line 
should be broken. If the passed textWidth extended beyond the end of the text 
(i.e., is larger than the width from textoffset to textLen), then the width of 
the text is subtracted from the textWidth and the result returned in the 
textWidth parameter. This can be used for the next format run. 


The routine result indicates whether the routine broke on a word boundary, 
character boundary, or the width extended beyond the edge of the text. 


When used with single-format text, the textStart can be zero, and the textEnd 
identical with the textLen. With multi-format text, the interval between 
textStart and textEnd specifies a format run. The interval between textPtr and 
textLen specifies a script run (a contiguous sequence of text where the script 
of each of the format runs is the same). Note that the format runs in 
StyledLineBreak must be traversed in back-end storage order, not display order 
(see GetFormatOrder). 


In other words, if the current format run is included in a contiguous sequence 
of other format runs of the same script, then the textPtr should point to the 
start of the first format run of the same script, while the textLen should 
include the last format run of the same script. This is so that word boundaries 
can extend across format runs; they will never extend across script runs. 


Although the offsets are in longint values and widths in fixed for future 
extensions, in the current version the longint values should be restricted to 
the integer range, and only the integer portion of the widths will be used. 
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VisibleLength 
In Pascal: 
FUNCTION VisibleLength ( textPtr : Ptr; textLen: Longint): Longint; 
In C: 
pascal long VisibleLength(Ptr textPtr,long textLen); 
This routine returns the length of the text excluding trailing white space, 


taking into account the script of the text. Trailing white space is only 
excluded if it occurs on the visible right side, in display order. 


12345 543234 
sg a oe 
VisibleLength of this left-tight example = 3. VisibleLength of this tightteft example = 5. 


Figure $-YisibleLeng th 
Figure 8—-VisibleLength 

For example, in Pascal: 
myVisibleLength := VisibleLength(myText,my0Offset) ; 


curSlop := myPixel - TextWidth(myText,0,myVisibleLength) ; 
DrawJust(myText,myVisibleLength, curSlop) ; 


UprText and LwrText 
In Pascal: 


Procedure UprText(textPtr: Ptr; len: Integer); 
Procedure LwrText(textPtr: Ptr; len: Integer); 


In C: 


pascal void UprText(Ptr textPtr,short len); 
pascal void LwrText(Ptr textPtr,short len); 


UprText provides a Pascal interface to the UprString assembly routine, which 
will uppercase text up to 32K in length. The LwrText routine provides the 
corresponding lowercase routine. Both of these routines will not change the 
number or position of characters in a string, but are faster and simpler than 
the Transliterate routine. 
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Text Comparison 


We have done some performance analyses of Pack6 comparison routines, and based 
upon those, were able to increase performance by about 50% on average. This 
increase results in a corresponding increase in 4th Dimension sorting 
performance, for example. Also, a long-standing bug in sorting "ce" and "ae" has 
been corrected. A test program on the Macintosh SE comparing "The quick brown 
fox jumped over the lazy dog" to variants produced the following decreases in 
comparison time: 


Identical text: 94% 
Last Character Unequal (g vs. X) 83% 
Last Character Weakly Equal (g vs. G): 82% 
First Character Unequal (T vs X): 59% 
First Character Weakly Equal (T vs t): 29% 


ALL Characters Weakly Equal (T vs t..g vs. G): 25% 


Part of the performance increase results from internal caching of ‘itl ' 
resources. Originally all ‘itl ' resources (resulting from IUGetIntl of 
0,1,2,4) were cached, but several programs do a_ ReleaseResource or 
_DetachResource on ‘itl0', rendering the cache invalid. Because of this, 
currently only ‘itl2' and 'itl4' are cached. Developers must be sure not to 
release or detach these resources. Also, only the system file resources are 
used, so they cannot be overridden by copies in the application or document 
resource forks. 
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"The quick brown fox jumped over the lazy doz” 


A. identical The quick brown fox jumped over the lazy dog” 
Last Char 
B. Unequal -The quick brown fox jumped over the lazy dox” 


C. similar “The quick brown fox jumped over the lezy doG” 
First Char 
OD. Unequal -xhe quick brown fox jumped over the lazy dog” 


E. similar “the quick brown fox jumped over the lazy dog” 
All Chars 
F.similar = “THR OUICK BREW FOX JUMPED ci¥RR THE LAZY DOG” 


oS SES8eeeek 


Tl $C Cf fH Ww 


Figure 9-Intermmational Text Comparison 


Figure 9-International Text Comparison 


The Macintosh date routines are extended to provide a larger range (roughly 35 
thousand years), and more information. This extension allows programs that need 
a larger range of dates to use system routines rather than produce their own, 
which may not be internationally compatible. The programmer can also access the 
stored location (latitude and longitude) and time zone of the Macintosh from 
parameter RAM. The Map cdev gives users the ability to change and reference 
these values. 


The long internal format of a date is as before, in seconds since 12:00 
midnight, January 1, 1904, but is represented as a signed 64-bit integer (SANE 
Comp format), allowing a somewhat larger range (roughly 500 billion years). 
Short internal format dates (since they are unsigned) can be converted to long 
format by filling the top 32 bits with zero; long formats can be converted to 
short by truncating (assuming that they are within range). When storing in 
files, a five (or six) byte format can be used for a range of roughly 35 
thousand years. This value should be sign-extended to restore it to a Comp 
format. 
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LongDateTime 


In Pascal: 


Type LongDateTime = Comp; 


In C: 


typedef comp LongDateTime; 


The standard date conversion record is extended using a new structure: 


In Pascal: 


LongDateRec 


In C: 


= Record 

case Integer of 

0: (  era,year,month,day,hour,minute,second, 
dayOfweek, dayOfYear,weekOfYear, 
pm,resl,res2,res3: Integer); 

1: ( list: array [longDateField] of Integer) ; 

2: ( eraAlt: Integer; 
oldDate: DateTimeRec) ; 


union LongDateRec { 


struct { 
short 
short 
short 
short 
short 
short 
short 
short 
short 
short 
short 
short 
short 
short 
}. ld; 


era; 
year; 
month; 

day; 

hour; 
minute; 
second; 
dayOfWeek; 
dayOfYear; 
weekOfYear; 
pm; 

resl1; 

res2; 

res3; 


short list[14]; /*Index by LongDateField!*/ 


struct { 
short 


eraAlt; 


DateTimeRec oldDate; 


#00; 
ae 


The default calendar for converting to and from the long internal format is the 
Gregorian calendar. The era field for this calendar has values 0 for A.D. and 
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-1 for B.C. (Note that the international date string conversion routines do not 
append strings for A.D. or B.C.) The current range allowed in conversion is 
roughly 30,000 BC to 30,000 AD. 


(Note that in different countries the change from the Julian calendar to 
Gregorian calendar occurred in different years: in Catholic countries, it 
occurred in 1582, while in Russia it took place as late as 1917. Dates before 
these years in those countries should use the Julian calendar for conversion. 
The Julian calendar differs from the Gregorian by three days every four 
centuries. ) 


LongDateRen 


Long Secs 


O.5T years 

2G years 

IM years 

35 millenia 

136 years 
194 days 

18 hours 

4 minutes 


Comp 
[64-bit {5 byte} 
migned Intezer] 
Gregor ant 
Japa toe 
(wear of the Exeperor's re dze | 


Arad ic 
jieete moor shares cme EF! 


FeBrew! 
| jlawar, Bue adger itt! 


Figure 10-Long Date «<-> String 
Figure 10—Long Date <-> String 


InitDateCache 
In Pascal: 


Function InitDateCache (theCache: DateCachePtr): OSErr; 
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In C: 
pascal OSErr InitDateCache(DateCachePtr theCache) ; 


This routine must be called before using the String2Date or String2Time routines 
to format the theCache record. Allocation of this record is the responsibility 
of the caller: it can either be a local variable, a Ptr or a locked Handle. By 
using this cache, the performance of the String2Date and String2Time routines is 
improved. 


In Pascal: 


Procedure MyRoutine; 
Var 
myCache: DateCacheRecord; 
Begin 
InitDateCache (@myCache) ; 
{call the String2Date or Time routines. Note that if you are } 
{ doing this inside an application where global variables are } 
{ allowed, you should probably make your Date cache a global and } 
{ initialize it once, when you initialize the Toolbox Managers. } 
End; 


In C: 
void MyRoutine() 
{ 
DateCacheRecord myCache; 


InitDateCache(&myCache) ; 

/* Now you can call String2Date or String2Time, Note that if you 
are doing this inside an application where global variables are 
allowed, you should probably make your Date cache a global and 
initialize it once when you initialize the Toolbox managers 

ite 


String2Date and String2Time 


In Pascal: 
Function String2Date(textPtr: Ptr; 
textLen: Longint; 
theCache: DateCachePtr; 
Var lengthUsed: Longint; 
Var dateTime: LongDateRec) 
: String2DateStatus; 
Function String2Time(textPtr: Ptr; 
textLen: Longint; 
theCache: DateCachePtr; 
Var lengthUsed: Longint; 
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Var dateTime: LongDateRec) 
: String2DateStatus; 


In C: 


pascal String2DateStatus String2Date(Ptr textPtr,long textLen, 
DateCachePtr theCache, long *lengthUsed, 
LongDateRec *dateTime) ; 


pascal String2DateStatus String2Time(Ptr textPtr, long textLen, 
DateCachePtr theCache, long *lengthUsed, 
LongDateRec *dateTime) ; 


These routines expect a date and time at the beginning of the text. They parse 
the text, setting the lengthUsed to reflect the remainder of the text, and fill 
the dateTime record. They recognize all the strings that are produced by the 
international date and time utilities, and others. For example, they will 
recognize the following dates: September 1, 1987; 1 Sept 1987; 1/9/1987; and 1 
1987 sEpT. 


If the value of the input year is less than 100, then it is added to 1900; if 

less than 1000, then it is added to 1000 (the appropriate values are used from 
other calendars, gotten from the base date: LongDateTime = 0). Thus the dates 
1/9/1987 and 1/9/87 are equivalent. 


The routines use the following grammar to interpret the date and time. The 
relevant fields of the international utilities resources are used for 
separators, month and weekday names, and the ordering of the date elements. The 
parsing is actually semantic-driven, so finer distinctions are made than those 
represented in the syntax diagram. 


time = number [tSep number [tSep number]] [mornStr | eveStr | timeSuff] 
tSep = timeSep | sep 

date = [dSep] dField [dSep dField [dSep dField [dSep dField [dSep]]]] 
dField := number | dayOfWeek | abbrevMonth | month 

dSep = dateSep | stO | stl | st2 | st3 | st4 | sep 

sep = <non-alphanumeric> 


The date defaults are the current day, month and year. The time defaults to 
00:00:00. The digits in a year are padded on the left, using the base date 
(the date corresponding to zero seconds: Jan 1, 1904). This routine uses the 
tokenizer to separate the components of the strings. It depends upon the names 
of the months and weekdays used from international resources being single 
alphanumeric tokens. 


Note that the date routine only fills in the year, month, day and dayOfWeek; the 
time routine fills in only the hour, minute and second. Thus the two routines 
can be called sequentially to fill complementary values in the LongDateRec. 


The return from the routine is a set of bits that indicate confidence levels, 
with higher numbers indicating low confidence in how closely the input string 
matched what the routine expected. For example, inputting a time of 12.43.36 
will work, but return a message indicating that the separator was not standard. 
This can also be used to parse a string containing both the date and time, by 
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using the confidence levels to determine which portion comes first. 
returned bits include: 


In Pascal: 
fatalDateTime = $8000; 
lLongDateFound = 1; 
LleftOverChars = 2; 
sepNotIntlSep = 4; 
fieldOrderNotintl = 8; 
extraneousStrings = 16; 
tooManySeps = 32; 
sepNotConsistent = 64; 
tokenErr = $8100; 
cantReadUtilities = $8200 
dateTimeNotFound = $8400 
dateTimeInvalid = $8800; 

In C: 
#define fatalDateTime 
#define LongDateFound 
#define leftOverChars 
#define sepNotIntlSep 
#define fieldOrderNotIntl 
#define extraneousStrings 
#define tooManySeps 
#define sepNotConsistent 
#define tokenErr 
#define cantReadUtilities 
#define dateTimeNotFound 
#define dateTimelInvalid 


0x8000 
1 


2 

4 

8 

16 

32 

64 
0x8100 
0x8200 
0x8400 
0x8800 


The 


LongDate2Secs and LongSecs2Date 


In Pascal: 


Procedure LongDate2Secs(lDate: LongDateRec; Var lSecs: LongDateTime) ; 


Procedure LongSecs2Date(lSecs: LongDateTime; Var lLDate: LongDateRec) ; 


In C: 


pascal void LongDate2Secs(const LongDateRec *lDate,LongDateTime *1lSecs) ; 


pascal void LongSecs2Date(LongDateTime *lSecs,LongDateRec *lDate) ; 


These routines extend the range of the Macintosh calendar as discussed above. 


Any fields that are not used should be zeroed. 


On input, the LongDate2Secs 


routine will use the day and month unless the day is zero; otherwise the 
dayOfYear is used unless it is zero; otherwise the dayOfWeek and weekOfYear are 


used. 
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Other fields are additive: if you supply a month of 37, that will be interpreted 
as adding 3 to the year, and using a month of 1. This latter property is 
subject to some restrictions imposed by the internal arithmetic: for example, | 
hour*60+minute | must be less than 32767. 

Two new interfaces have been added to Pack6é for LongDate support: 

In Pascal: 


IULDateString(dateTime: LongDateTime; form: DateForm; Var Result: Str255; 
intlParam: Handle) ; 


Assembly selector: 20 


IULTimeString(dateTime: LongDateTime; wantSeconds: BOOLEAN; Var Result:Str255; 
intlParam: Handle) ; 


Assembly selector: 22 
In C: 


pascal void IULDateString(LongDateTime *dateTime,DateForm lLongFlag, 
Str255 result, Handle intlParam) ; 


pascal void IULTimeString(LongDateTime *dateTime, Boolean wantSeconds, 
Str255 result, Handle intlParam) ; 


These routines take a LongDateTime, and return a formatted string. Only the old 
fields year..second, and dayOfWeek are used. If the intlParam is zero, then the 
international resource © ('itl@') is used. The output year is limited to four 
digits: e.g., from 1 to 9999 A.D. 


ToggleDate and ValidDate 

In Pascal: 

Function ToggleDate (Var mySecs: LongDateTime; field: LongDateField; 
delta: DateDelta; ch: Integer; 
params: TogglePB) :ToggleResults; 


Function ValidDate (Var date : LongDateRec; flags: Longint; 
Var newSecs: LongDateTime) : Integer; 


In C: 
pascal ToggleResults ToggleDate(LongDateTime *1lSecs,LongDateField field, 
DateDelta delta,short ch, 
const TogglePB *params) ; 


pascal short ValidDate(LongDateRec *vDate,long flags,LongDateTime *newSecs) ; 


The ToggleDate routine is used to modify a date or time record by toggling one 
of the fields up or down. The routine returns a valid date by performing two 
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types of action. If the affected field overflows or underflows, then it will 
wrap to the corresponding low or high value. If changing the affected field 
causes other fields to be invalid, then a close date is selected (which may 
cause other fields to change). For example, toggling the year upwards in 
February 29, 1980 results in March 1, 1981. Currently only the fields 
year..second, and am can be toggled, although this should change in the future. 


The routine will also toggle by character, if the delta = 0. The character will 
be used to change the field in the following way. If it is a digit, then it 
will be added to the end of the field, and the field will be then modified to be 
valid in a similar manner as in the alarm clock. For example, if the minute is 
54, then to replace it by 23 by entering characters, first the minute will 
change to 42, then to 23. The AM/PM field will also use letters. 


In Pascal: 


TogglePB = RECORD 
togFlags: LONGINT; 


amChars: ResType; {from intl0} 
pmChars: ResType; {from intl0} 
reserved: ARRAY [0..3] OF LONGINT; 
END; 
In C: 
struct TogglePB { 

long togFlags; 

ResType amChars; /*from intl0*/ 

ResType pmChars; /*from intl0*/ 

long reserved[4]; 


te 


The parameter block should be set up as follows. It should contain the 
uppercase versions of the AM and PM strings to match (the defaults mornStr and 
eveStr can be copied from the international utilities using IUGetIntl, and 
converted to uppercase with UprText). 


The ToggleDate routine makes an internal call to ValidDate, which can also be 
called directly by the user. ValidDate checks the date record for correctness, 
using the params.togflags which is passed to it by ToggleDate. If any of the 
record fields are invalid, ValidDate returns a DateField value corresponding to 
the field in error. Otherwise, it returns a -1l. 


The params.togflags value passed to ValidDate by ToggleDate are the same for 
ToggleDate and ValidDate. The low word bits correspond to the values in the 
enumerated type DateField. For example, to check the validity of the year field 
you can create a mask by doing the following: 


yearFieldMask = 2**yearField; 


The high word of the flags value can be used to set various other conditions. 
The only one currently used is a flag which can be set to restrict the range of 
valid dates to the short date format (smallDateBit = 31; smallDateMask = 
$80000000). All other bits are reserved, and should be set to zero. The 
reserved values should also be zeroed. 
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Togflags should normally be set to $007F, which can be done by using the 
predeclared constant dateStdMask. 


LongDateRer 


reserved 
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Figure 11-ToggleDate 
Figure 11—ToggleDate 


ReadLocation and WriteLocation 
In Pascal: 


PROCEDURE ReadLocation(VAR loc: MachineLocation) ; 
PROCEDURE WriteLocation(loc: MachineLocation) ; 
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In C: 


pascal void ReadLocation(MachineLocation *loc); 
pascal void WriteLocation(const MachineLocation *loc); 


These routines allow the programmer to access the stored geographic location of 
the Macintosh and time zone information from parameter RAM. For example, the 
time zone information can be used to derive the absolute time (GMT) that a 
document or mail message was created. With this information, when the document 
is received across time zones, the creation date and time are correct. 
Otherwise, documents can appear to be created after" they are read (e.g., I can 
create a message in Tokyo on Tuesday and send it to Cupertino, where it is 
received and read on Monday). Geographic information can also be used by 
applications which require it. 


If the MachineLocation has never been set, then it should be <0,0,0>. The top 
byte of the gmtDelta should be masked off and preserved when writing: it is 
reserved for future extension. The gmtDelta is in seconds east of GMT: e.g., 
San Francisco is at minus 28,800 seconds (8 hours * 3600 seconds per hour). The 
latitude and longitude are in fractions of a great circle, giving them accuracy 
to within less than a foot, which should be sufficient for most purposes. For 
example, Fract values of 1.0 = 90°, -1.0 = -90°, -2.0 = -180°. 


In Pascal: 


MachineLocation = RECORD 
latitude: Fract; 
longitude: Fract; 
CASE INTEGER OF 
0: 
(dlsDelta: SignedByte) ; 
{signed byte; daylight savings delta} 
1: 
(gmtDelta: LONGINT) ; 
{must mask - see documentation} 
END; 


In C: 
struct MachineLocation { 


Fract latitude; 
Fract longitude; 


union{ 
char dlsDelta; /*signed byte; daylight savings delta*/ 
long  gmtDelta; /*must mask - see documentation*/ 

} gmtFlags; 


a 


The gmtDelta is really a three-byte value, so the user must take care to get and 
set it properly as in the following code examples: 


In Pascal: 


Function GetGmtDelta(myLocation: MachineLocation): longint; 
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Var 
internalGmtDelta: Longint; 


begin 
With myLocation Do Begin 
internalGmtDelta := BAnd(gmtDelta, $00FFFFFF) ; {get value} 
If BTst(internalGmtDelta, 23) {sign extend} 


Then internalGmtDelta := BOr(internalGmtDelta, $FFO00000) ; 
GetGmtDelta := internalGmtDelta; 
End; 
End; 


Procedure SetGmtDelta(Var myLocation: Location; myGmtDelta: Longint); 
Var 
tempSignedByte: SignedByte; 


BEGIN 
WITH myLocation DO BEGIN 
tempSignedByte := dlsDelta; 
gmtDelta := myGmtDelta; 
dlsDelta := tempSignedByte; 


END; 
END; 
In C 
long GetGmtDelta(MachineLocation myLocation) 
; long internalGMTDelta; 
internalGMTDelta = myLocation.gmtDelta & Ox00fffTfTFT; 
if ( (internalGMTDelta >> 23) & 1 ) // need to sign extend 
internalGmtDelta = internalGmtDelta | Oxff000000; 
; return(internalGmtDelta) ; 


void SetGmtDelta(MachineLocation *myLocation, long myGmtDelta) 
char tempSignedByte; 
tempSignedByte = myLocation->dlsDelta; 


myLocation->gmtDelta = myGmtDelta; 
myLocation->dlsDelta tempSignedByte; 
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Figure 12-Locations 


Figure 12—Locations 


Setting Latitude, Longitude, and Time Zone cdev 


This new Control Panel module on the utilities disk allows the user to set the 
latitude, longitude, and time zone. The values are stored in parameter RAM on 
the host machine. (See the Map cdev documentation for more details). 
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Figure 13-Map 
Figure 13—Map 


The new number routines supplement SANE, allowing applications to display 
formatted numbers in the manner of Microsoft Excel or Fourth Dimension, and to 
read both formatted and simple numbers. The formatting strings allow natural 
display and entry of numbers and editing of format strings even though the 
Original numbers and the format strings were entered in a language other than 
that of the final user. 


Number parsing is based on a NumberParts table that describes the essentials of 
numeric display for a particular language, including such components as 
thousands separator, decimal point, scientific notation, forced zeroes in the 
absence of significant digits, etc. A default NumberParts table for each 
locale's system resides in the 'itl4' resource for that system. 


NumberParts 
In Pascal: 


NumberParts = RECORD 
version: integer; 
data: array [tokLeftQuote. .tokMaxSymbols] OF WideChar; 
pePlus, peMinus, peMinusPlus: WideCharArr; 
altNumTable: WideCharArr; 
reserved: packed array [@..19] of Char; (must be zeroed!} 
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END; 


In C: 
struct NumberParts { 
short version; 
WideChar data[31]; /*index by [tokLeftQuote. .tokMaxSymbols]*/ 
WideCharArr pePlus; 
WideCharArr peMinus; 
WideCharArr peMinusPlus; 
WideCharArr altNumTable; 
char reserved[20]; 
Hi 
Here is an example of how to access the 'itl4' default NumberParts table: 
In Pascal: 
Function DefaultParts( Var x: NumberParts ): Boolean; 
Var 
itl4: Itl4Handle; 
Begin 
DefaultParts := false; {assume error} 
itl4 := itl4Handle(IUGetIntl(4)); {get itl4 record} 
if itl4 <> nil then begin {if ok} 
xX := NumberPartsPtr(ord(itl4*)+itl4**.defPartsOffset)~* 
{numberParts subtable} 
DefaultParts := true; {no error} 
end; 
End; 
In C 


DefaultParts(NumberParts *x) 


Itl4Handle itl4; 


itl4 = (Itl4Handle) IUGetIntl(4) ; 
if ( itl4 ) { 
*x = *((NumberPartsPtr)( (char *) (*itl4 
((*1t1l4)->defPartsOffset ) ) ); 
return(1); 
} 
return(0); 


The user provides a format descriptor string very 
This format string is translated by Str2Format in 
transportable between different languages such as 
The canonical format is stored in a record called 
record's structure is as follows: 


In Pascal: 
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NumFormatString = PACKED RECORD 
flength: Byte; 
fVersion: Byte; 
data: PACKED ARRAY [@..253] OF SignedByte; 
{private data} 
END; 


In C: 


struct NumFormatString { 

char fLength; 

char fVersion; 

char data[254]; /*private data*/ 
hi; 


The format descriptor string may be broken into as many as three parts: 
positive, negative, and zero. For example, the number 3456.713 used with the 
canonical format produced from "#,###.#; (#,###.#)" will produce the string 
representation "3,456.7" in the United States. In Switzerland the same 
canonical format would be displayed as "#.###,#; (#.###,#)," and the number 
displayed with this format would be "3.456,7." 


The number formats include the following features (the defaults for the U.S. are 
listed following): 


Separators: 


decimal separator (.), thousands separator (,) 


Example: format string: ###,###.0##, ### 
1 —> 1.0 
1234 —> 1,234.0 
3.141592 —> 3,141,592 
Digits: 


zero digit (0), skipping digit (#), padding digit (*), padding value (NBSP) 


Example: format string: ###; (000) ;*** 
1 —> 1 

-1 —> (001) 

0 —> 0 


The number format routines always fill in digits from the right or 
from the left of the decimal point. 


Example: format string: ###‘foo'### 
123f00456 -—> 123 f00456 

220044 —> 200244 

123f0o —> 123 

Example: format string: 0.###'foo'### 
0. f00123 —> 0.123 

0.1f00456 —> 0.145f006 

0.1456 —> 0.145f006 
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Formats using zero and skipping digit characters do not allow extension 
beyond the minimum number of digits specified to the right or left of 
the decimal place. For example: users must provide the desired maximum 
digits on the left: e.g., #,###,### instead of #,###. X2FormStr will 
return a result of formatOverflow when the number contains more digits 
to the left of the decimal point than specified in the format string. 
Input values with more digits to the right of the decimal point than 
there are digits allowed in the format string will be rounded on output. 


Example: format string: ###.### 
1234.56789 —> formatOverflow on output 
1.234999 —> 1.235 

Control: 


left quote (‘), right quote ('), escape quote (\), sign separator (;) 


Example: format string: ###‘CR';###‘DB';‘\'zero\'' 
1 —> 1CR 
-1 —> 1DB 
0 —> ‘zero' 
Marks: 


plus (+), minus (-), percent (%), positive exponent (E+), 
negative exponent (E-), mixed exponent (E) 


Example: format string: ##% 
0.1 —> 10% 


There is a limitation creating format strings with exponential notation: 
the user must always place zero leaders immediately after the exponent 
marks and skipping digits before, when more than one digit must be 
represented between the exponent and the decimal point. 


Example: format string: ##.####E+0 
1.23E+3 —> 1.23E+3 


The sign of exponents must be made explicit in the format string by using 
ePlus (E+) or eMinus (E-) format. eMinusPlus notation (E) is only used 
in the input number string to specify a positive exponent when the sign 
of the format string exponent is negative. 


format + exponent sign - 
ePlus ePLlus (E+) eMinus (E-) 
eMinus eMinusPlus(E) eMinus (E- ) 


Use ePlus notation in the format string to specify negatively or 
positively signed exponents in the input number string: 


Example: ePlus format string: #.#E+# 
1.2E-3 —> 1.2E-3 
1,.2E+3 —> 1,.2E+3 
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Example: eMinus format string: #.#E-# 


1.2E-3 —> 1.2E-3 
1.2E3 —> 1.2E3 (i.e., 1200) 
Literals: 


unquoted literals ([]$:(){}), literals requiring quotes (ABC...) 


Example: format string: [###‘ Million '###‘ Thousand '###] 
300 —> [300] 
3000000 —> [3 Million 000 Thousand 000] 


A typical scenario consists of the application reading the default NumberParts 
table from 'itl4'. One provides a format definition string, such as the string 
"# HHH 4; (#.4##,#)" of the above example, as a template for whatever field one 
is currently working in. The application submits that string to Str2Format, 
which returns a canonical format string corresponding to the user's input. 
This canonical format, rather than the raw format definition string, is stored 
in the document. The program can convert the canonical format back to a user- 
editable string using the Format2Str routine. 


When a number is to be displayed, the application passes the number and 
canonical format to FormatX2Str to produce a formatted number that the 
application then displays in that field. If the user types a string into the 
field, then FormatStr2X can be used with the canonical format for the field to 
read formatted numbers. That is, the user can type "(3.678,9)" and have the 
number interpreted correctly. 


Str2Format 
In Pascal: 


FUNCTION Str2Format(inString: Str255;partsTable: NumberParts; 
VAR outString: NumFormatString): FormatStatus; 


In C: 


pascal FormatStatus Str2Format(const Str255 inString, 
const NumberParts *partsTable, 
NumFormatString *outString) ; 


Str2Format converts a string typed by the user into a canonical format. It 
checks the validity of the format string itself and also that of the NumberParts 
table, because the NumberParts table is programmable by the application. 
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Format2Str 
In Pascal: 


FUNCTION Format2Str(myCanonical: NumFormatString;partsTable: NumberParts; 
VAR outString: Str255; 
VAR positions: TripleInt): FormatStatus; 


In C: 


pascal FormatStatus Format2Str(const NumFormatString *myCanonical, 
const NumberParts *partsTable,Str255 outString,TripleInt *positions) ; 


Format2Str creates the string corresponding to a format definition string which 
has been created by a prior call to Str2Format and according to the NumberParts 
table. It is the inverse operation of Str2Format. This allows programs to 
display previously entered formats for users to edit. 
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Figure 15—Format2Str 


FormatX2Str 
In Pascal: 
FUNCTION FormatX2Str(x: Extended;myCanonical: NumFormatString; 
partsTable: NumberParts; 
VAR outString: Str255): FormatStatus; 
In C: 
pascal FormatStatus FormatX2Str(extended x,const NumFormatString *myCanonical, 
const NumberParts *partsTable, 
Str255 outString); 


This routine creates a textual representation of a number according to a 
canonical format which has been created by a prior call to Str2Format. 
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Figure 16-—Formmat*23tr 
Figure 16—FormatX2Str 


FormatStr2X 
In Pascal: 


FUNCTION FormatStr2X(source: Str255;myCanonical: NumFormatString; 
partsTable: NumberParts; VAR x: Extended): FormatStatus; 


In C: 


pascal FormatStatus FormatStr2X(const Str255 source, 
const NumFormatString *myCanonical, 
const NumberParts *partsTable,extended *x); 


This routine reads a textual representation of a number according to a canonical 
format which has been created by a prior call to Str2Format, and creates an 
extended floating point number which corresponds to that string. 


Internally, the routine converts the string into a format acceptable to SANE, 
matching against the three possible patterns in the canonical format. If the 
input string does not match any of the patterns, then FormatStr2X parses the 
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string as best it can returning the result. Currently it is converted to a 
simple form, stripping non-digits and replacing the decimal point, before 
calling SANE. 


Formatted String 
(3.456 70% 
MumberParts 
Table 


Canonical Fornat 


niet 


(Te 


Extended 


-3456.7 


Figure 17-—Formmats02% 
Figure 17—FormatStr2Xx 
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HINTS FOR USING THE SCRIPT MANAGER 


This section contains two programming suggestions you may find useful when using 
the Script Manager. 


Note: In a work of this scope it is impossible to cover all aspects of script 
manipulation. It is strongly advised that you obtain the latest version 
of the Script Manager Developer's Package before trying to write an 
application that uses the Script Manager. This documentation is 
available through the APDA. 


Testing for the Script Manager 


Verify that the Script Manager is installed by checking to see if the Script 
Manager trap is implemented. To identify the number of scripts currently 
enabled, use the verb smEnabled. There is always at least one enabled 
script—Roman. Programs can use this information to optimize performance for the 
Roman version: 


{ Globals } 
Const 
UnimplCoreRoutine = $9F; {unimplemented core routine} 
ScriptUtil = $B5; {the Script Manager trap} 
Var 


scriptsInstalled : Integer; {global for testing throughout } 
{ application} 


{ Initialization: find out whether we can use the Script Manager } 


scriptsInstalled := 0; 
if GetTrapAddress(UnimplCoreRoutine) <> GetTrapAddress(ScriptUtil) 
then scriptsInstalled := GetEnvirons(smEnabled) ; 


{ Code: we can then bracket sections of the code that use the } 
{ Script Manager } 


if scriptsInstalled > 1 
then begin 

{use CharByte} 
end else begin 

{don't use CharByte} 
end; 
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Most script systems other than Roman will not install themselves on 64K ROMs, 
but the Roman interface system and utility routines will always be present if 
the Script Manager is installed. 


Setting the Keyboard Script 


When the user selects a font from a menu, or clicks in text of a different 
script, the application should set the keyboard script. Key Caps Version 2.0 
does this, for example. Use the following code: 


{ Set the font for the item or port to myFont } 
{ Set the keyboard to agree with the current script, if different} 


if scriptsInstalled > 1 then begin {only if 2+ } 
{ scripts} 
if myFont <> oldFont then begin {quick check for } 
{ speed} 
newScript := Font2Script(myFont) ; {find the } 
{ script} 
if newScript <> oldScript then begin {if different} 
if multiFont or { always switch } 


{ mixed fonts} 
(GetEnvirons(smKeyScript) <> smRoman) {don't } 
{ switch if not} 


then KeyScript(newScript) ; {switch the } 
{ keyboard} 
oldScript := newScript; {save global} 
end; 
oldFont := myFont; {save global} 


end; 
end; 


Roman script is a special case with single-script text. Non-Roman scripts 
typically include the 128 ASCII characters, and users will alternate between the 
Roman keyboard and the native keyboard. Hence the Roman keyboard should be left 
alone when switching. With mixed-script text this is not true, since users will 
be using a Roman font when they want Roman text. For this case, you do not need 
to test for Roman. 


To get the current keyboard script, and the system or application font for that 
script, use the code: 


{ For the system font } 

if scriptsInstalled <= 1 then scriptFont := systemFont 

{default to system font} 

else scriptFont := GetScript(GetEnvirons(smKeyScript), smScriptSysFond) ; 


{ For the application font } 

if scriptsInstalled <= 1 then scriptFont := applFont 

{default to application font} 

else scriptFont := GetScript(GetEnvirons(smKeyScript), smScriptAppFond) ; 
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This code can be used if your application does not have an interface that lets 
users change fonts but still needs to provide for different scripts. 
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SUMMARY OF THE SCRIPT MANAGER 


This summary only covers the original Script Manager routines. 


The 


Script Manager 2.0 routines and constants are available in the MPW 3.0 


Note: 

and later interface files. 
Constants 
CONST 


{ Values of thePort.font } 


smRoman = 0; 
smKanji = 1: 
smChinese = F225 
smKorean = 3; 
smArabic = 4: 
smHebrew = '53 
smGreek = 6; 
smRussian = 3 
smReserved1 = 8; 
smDevanagari = 9; 
smGurmukhi = 10; 
smGujarati =>. 
smOriya = 12: 
smBengali = 13; 
smTamil = 14; 
smTelugu = 15; 
smKannada = 16; 
smMalayalam = 17; 
smSinhalese = 18; 
smBurmese = 19; 
smKhmer = 20; 
smThai = 21; 
smLaotian = 22; 
smGeorgian = 23; 
smArmenian = 24; 
smMaldivian = 25" 
smTibetan = 26; 
smMongolian = 27; 
smAmharic = 28; 
smSlavic = 29; 
smVietnamese = 30; 
smSindhi = 31; 
smUninterp = 32; 
{ CharType character 
smCharPunct = 0; 
smCharAscii = 1; 
smCharEuro — a7 Be 


{normal ASCII alphabet} 
{Japanese} 
{Chinese} 
{Korean} 
{Arabic} 
{Hebrew} 
{Greek} 
{Cyrillic} 
{reserved} 
{Devanagari} 
{Gurmukhi} 
{Gujarati} 
{Oriya} 
{Bengali} 
{Tamil} 
{Telugu} 
{Kannada} 
{Malayalam} 
{Sinhalese} 
{Burmese} 
{Khmer} 
{Thai} 
{Laotian} 
{Georgian} 
{Armenian} 
{Maldivian} 
{Tibetan} 
{Mongolian} 
{Ethiopian} 
{non-Cyrillic Slavic} 
{Vietnamese} 
{Sindhi} 
{uninterpreted symbols} 


types } 
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{ CharType character classes } 


smPunctNormal = $0000; 
smPunctNumber = $0100; 
smPunctSymbol = $0200; 
smPunctBlank = $0300; 


{ CharType directions } 


smCharLeft 
smCharRight 


$0000; 
$2000; 


{ CharType character case } 


$0000; 
$4000; 


smCharLower 
smCharUpper 


{ CharType character size (1 or 2 byte) } 


$0000; 
$8000; 


smCharlbyte 
smChar2byte 


{ Transliterate targets } 


smTransAscii = 0 {target is Roman script} 
smTransNative = 1 {target is non-Roman script} 
smTransLower = 16384 {target becomes lowercase} 
smTransUpper = 32768 {target becomes uppercase} 
smMaskAscii = 1 {convert only Roman script} 
smMaskNative = 2 {convert only non-Roman script} 
smMaskALl = -l {convert all text} 


{ GetScript verbs } 


smScriptVersion 0 Software version 

smScriptMunged 2 Script entry changed count 
smScriptEnabled 4 Script enabled flag 

smScriptRight 6 Right-to-left flag 

smScriptJust 8 Justification flag 

smScriptRedraw 10 Word redraw flag 

smScriptSysFond 12 Preferred system font 
smScriptAppFond 14 Preferred application font 
smScriptNumber 16 Script 'itl0' ID, from dictionary 
smScriptDate 18 Script 'itl1' ID, from dictionary 
smScriptSort 20 Script 'itl2' ID, from dictionary 
smScriptFlags 22 Script Flags Word 

smScriptToken 24 '13tl4' ID number 

smScriptRsvd 26 Reserved 

smScriptLang 28 Script's language code 
smScriptNumDate 30 Number/Date Representation codes 
smScriptKeys 32 Script 'KCHR' ID, from dictionary 
smScripticon 34 Script 'SICN' ID, from dictionary 
smScriptPrint 36 Script printer action routine 
smScriptTrap 38 Trap entry pointer 
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smScriptCreator 40 Script file creator 
smScriptFile 42 Script file name 
smScriptName 44 Script name 


{ GetEnvirons verbs } 


smVersion 0 Environment version 

smMunged 2 Globals changed count 

smEnabled 4 Environment enabled flag 

smBiDirect 6 Set if scripts of different directions 
are installed together 

smFontForce 8 Force font flag 

smIntlForce 10 Force international utilities flag 

smForced 12 Current script forced to system script 

smDefault 14 Current script defaulted to Roman script 

smPrint 16 Printer action routine 

smSysScript 18 System script 

smLastScript 20 Last keyboard script 

smKeyScript 22 Keyboard script 

smSysRef 24 System folder reference number 

smKeyCache 26 Keyboard table cache pointer 

smKeySwap 28 Swapping table pointer 

smGenF lags 30 General Flags 

smOverride 32 Script Override flags 

smCharPortion 34 Ch vs Sp Extra proportion, 4.12 fixed 

Routines 


Script Information Routines 


FUNCTION FontScript : Integer; 
FUNCTION IntlScript : Integer; 
PROCEDURE KeyScript (scriptCode: Integer); 


Character Information Routines 


FUNCTION CharByte (textBuf: Ptr; textOffset: Integer) : Integer; 
FUNCTION CharType (textBuf: Ptr; textOffset: Integer) : Integer; 


Text Editing Routines 


FUNCTION Pixel2Char (textBuf: Ptr; textLen, slop,pixelWidth: Integer ; 
VAR leftSide: Boolean): Integer; 

FUNCTION Char2Pixel (textBuf: Ptr; textLen, slop,offset: Integer; 
direction: SignedByte) : Integer; 

PROCEDURE FindWord (textPtr: Ptr; textLength, offset: Integer; 
leftSide: Boolean; breaks: BreakTable; 
var offsets: OffsetTable) ; 

PROCEDURE HiliteText (textPtr: Ptr; 
textLength, firstOffset, secondOffset: Integer; 
VAR offsets: OffsetTable) ; 

PROCEDURE DrawJust (textPtr: Ptr; textLength, slop: Integer); 

PROCEDURE MeasureJust (textPtr: Ptr; textLength, slop: Integer; charLocs: Ptr); 


@ SpInside Macintosh * Version 1.0 * November 1989 * Apple Computer 
THE SCRIPT MANAGER ¢ 73 of 75 


Advanced Routines 


FUNCTION Transliterate (srcHandle, dstHandle: Handle; 
target: Integer; srcMask: Longint) : Integer; 
FUNCTION Font2Script (fontNumber: Integer) : Integer; 


System Routines 


FUNCTION GetScript (script, verb: Integer) : LongInt; 

FUNCTION SetScript (script, verb: Integer; param: LongInt) : OSErr; 
FUNCTION GetEnvirons (verb: Integer) : LongInt; 

FUNCTION SetEnvirons (verb: Integer; param: LongInt) : OSErr; 
FUNCTION GetDefFontSize: Integer; 


FUNCTION GetSysFont: Integer; 
FUNCTION GetAppFont: Integer; 
FUNCTION GetMBarHeight: Integer; 
FUNCTION GetSysJust: Integer; 
PROCEDURE SetSysJust (newJust: Integer); 


Assembly-Language Information 
Constants 


; Routine selectors for ScriptUtil trap 


smFontScript EQU 0 
smIntlScript EQU 2 
smKybdScript EQU 4 
smFont2Script EQU 6 
smGetEnvirons EQU 8 
smSetEnvirons EQU 10 
smGetScript EQU 12 
smSetScript EQU 14 
smCharByte EQU 16 
smCharType EQU 18 
smPixel2Char EQU 20 
smChar2Pixel EQU 22 
smTranslit EQU 24 
smFindWord EQU 26 
smHiliteText EQU 28 
smDrawJust EQU 30 


smMeasureJust EQU 32 
Trap Macro Name 
_Scriptutil 


Note: You can invoke each of the Script Manager routines with a macro that 
has the same name as the routine preceded by an underscore. 
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Further Reference: 


QuickDraw 

International Utilities 

Binary-Decimal Conv Pkg 

Font Manager 

TextEdit 

Technical Note #153, Changes in International Utilities and Resources 
Technical Note #160, Key Mapping 

Technical Note #174, Accessing the Script Manager Print Action Routine 
Technical Note #182, How to Construct Word-Break Tables 

Technical Note #241, Script Manager's Pixel2Char Routine 

Technical Note #242, Fonts and the Script Manager 

Technical Note #243, Script Manager Variables 


END OF DOCUMENT 
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